Re: [ovirt-users] VMs stuck in migrating state

2018-03-05 Thread nicolas

El 2018-03-02 15:34, Milan Zamazal escribió:

nico...@devels.es writes:


El 2018-03-02 14:10, Milan Zamazal escribió:

nico...@devels.es writes:

We're running 4.1.9 and during the weekend we had a storage issue 
that

seemed
to leave some hosts in an strange state. One of the hosts has almost 
all VMs
migrating (although it seems to not actually being migrating them) 
and the

migration state cannot be cancelled.

When clicking on one of those machines and selecting 'Cancel 
migration', in

the
ovirt-engine log I see:

2018-02-26 08:52:07,588Z INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.CancelMigrateVDSCommand]
(org.ovirt.thread.pool-6-thread-36) 
[887dfbf9-dece-4f7b-90a8-dac02b849b7f]

HostName = host2.domain.com
2018-02-26 08:52:07,588Z ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.CancelMigrateVDSCommand]
(org.ovirt.thread.pool-6-thread-36) 
[887dfbf9-dece-4f7b-90a8-dac02b849b7f]

Command 'CancelMigrateVDSCommand(HostName = host2.domain.com,
CancelMigrationVDSParameters:{runAsync='true',
hostId='e63b9146-10c4-47ad-bd6c-f053a8c5b4eb',
vmId='26d37e43-32e2-4e55-9c1e-1438518d5021'})' execution failed:
VDSGenericException: VDSErrorException: Failed to CancelMigrateVDS, 
error =

Migration process cancelled, code = 82

On the vdsm side I see:

2018-02-26 08:56:19,396+ INFO  (jsonrpc/0) [vdsm.api] START
migrateCancel()
from=:::10.X.X.X,54654, 
flow_id=874d36d7-63f5-4b71-8a4d-6d9f3ec65858

(api:46)
2018-02-26 08:56:19,398+ INFO  (jsonrpc/0) [vdsm.api] FINISH
migrateCancel
return={'status': {'message': 'Migration process cancelled', 'code': 
82},

'progress': 0} from=:::10.X.X.X,54654,
flow_id=874d36d7-63f5-4b71-8a4d-6d9f3ec65858 (api:52)

So no error on the vdsm side log.


Interesting.  The messages above indicate that the VM was attempted 
to
migrate, but the migration got temporarily rejected on the 
destination
due to the number of already running incoming migrations (the limit 
is 2

incoming migrations by default).  Later, Vdsm was asked to cancel the
outgoing migration and it successfully set a migration canceling 
flag.

However the action was reported as an error to Engine, due to hitting
the incoming migration limit on the destination.  Maybe it's a bug, 
I'm
not sure, resulting in minor confusion.  Normally it shouldn't 
matter,
the migration should be canceled shortly after anyway and Engine 
should

be informed about that.

However the migration apparently wasn't canceled here.  I can't say 
what
happened without complete Vdsm log.  One of possible reasons is that 
the
migration has been waiting on completion of another migration 
outgoing
from the source (only one outgoing migration at the time is allowed 
by

default).  In any case it seems the migration either wasn't actually
started at all or it just started being set up and that has never 
been

completely finished.



I'm attaching the log. Basically the storage backend was restarted by 
fencing
and then this issue happened. This was on 26/02 at about 08:52 (log 
time).


Thank you for the log, but VMs are already “migrating” at its 
beginning,

there had to be some problem already earlier.


I already tried restarting ovirt-engine but it didn't work.


Here the problem is clearly on the Vdsm side.

Could someone shed some light on how to cancel the migration status 
for

these
machines? All of them seem to be running on the same host.


Did the VMs get unblocked in the meantime?  I can't know what's the


No, they didn't. They're still in a "Migrating" state.

actual state of the given VMs without seeing the complete Vdsm log, 
so
it's difficult to give a good advice.  I think that Vdsm restart on 
the
given host would help BUT it's generally not a very good idea to 
restart

Vdsm if any real migration, outgoing or incoming, is running on the
host.  VMs that aren't actually being migrated (despite being 
reported
as migrating) at all should simply return to Up state after the 
restart,

but VMs with any real migration action pending might get return to Up
state without proper cleanup, resulting in a different kind of mess 
or
maybe something even worse (things should improve in oVirt 4.2, but 
it's

still good to avoid Vdsm restarts with migrations running).



I assume this is not a real migration as it has been in this state for 
several

days. Would you advice restarting vdsm in this case then?


I'd say try it.  Since nothing has changed for several days, restarting
Vdsm looks like appropriate action at this point.  Just don't make a
habit of it :-).



Thanks, that made it.

Regards.


Regards,
Milan

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VMs stuck in migrating state

2018-03-02 Thread Milan Zamazal
nico...@devels.es writes:

> El 2018-03-02 14:10, Milan Zamazal escribió:
>> nico...@devels.es writes:
>>
>>> We're running 4.1.9 and during the weekend we had a storage issue that
>>> seemed
>>> to leave some hosts in an strange state. One of the hosts has almost all VMs
>>> migrating (although it seems to not actually being migrating them) and the
>>> migration state cannot be cancelled.
>>>
>>> When clicking on one of those machines and selecting 'Cancel migration', in
>>> the
>>> ovirt-engine log I see:
>>>
>>> 2018-02-26 08:52:07,588Z INFO
>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.CancelMigrateVDSCommand]
>>> (org.ovirt.thread.pool-6-thread-36) [887dfbf9-dece-4f7b-90a8-dac02b849b7f]
>>> HostName = host2.domain.com
>>> 2018-02-26 08:52:07,588Z ERROR
>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.CancelMigrateVDSCommand]
>>> (org.ovirt.thread.pool-6-thread-36) [887dfbf9-dece-4f7b-90a8-dac02b849b7f]
>>> Command 'CancelMigrateVDSCommand(HostName = host2.domain.com,
>>> CancelMigrationVDSParameters:{runAsync='true',
>>> hostId='e63b9146-10c4-47ad-bd6c-f053a8c5b4eb',
>>> vmId='26d37e43-32e2-4e55-9c1e-1438518d5021'})' execution failed:
>>> VDSGenericException: VDSErrorException: Failed to CancelMigrateVDS, error =
>>> Migration process cancelled, code = 82
>>>
>>> On the vdsm side I see:
>>>
>>> 2018-02-26 08:56:19,396+ INFO  (jsonrpc/0) [vdsm.api] START
>>> migrateCancel()
>>> from=:::10.X.X.X,54654, flow_id=874d36d7-63f5-4b71-8a4d-6d9f3ec65858
>>> (api:46)
>>> 2018-02-26 08:56:19,398+ INFO  (jsonrpc/0) [vdsm.api] FINISH
>>> migrateCancel
>>> return={'status': {'message': 'Migration process cancelled', 'code': 82},
>>> 'progress': 0} from=:::10.X.X.X,54654,
>>> flow_id=874d36d7-63f5-4b71-8a4d-6d9f3ec65858 (api:52)
>>>
>>> So no error on the vdsm side log.
>>
>> Interesting.  The messages above indicate that the VM was attempted to
>> migrate, but the migration got temporarily rejected on the destination
>> due to the number of already running incoming migrations (the limit is 2
>> incoming migrations by default).  Later, Vdsm was asked to cancel the
>> outgoing migration and it successfully set a migration canceling flag.
>> However the action was reported as an error to Engine, due to hitting
>> the incoming migration limit on the destination.  Maybe it's a bug, I'm
>> not sure, resulting in minor confusion.  Normally it shouldn't matter,
>> the migration should be canceled shortly after anyway and Engine should
>> be informed about that.
>>
>> However the migration apparently wasn't canceled here.  I can't say what
>> happened without complete Vdsm log.  One of possible reasons is that the
>> migration has been waiting on completion of another migration outgoing
>> from the source (only one outgoing migration at the time is allowed by
>> default).  In any case it seems the migration either wasn't actually
>> started at all or it just started being set up and that has never been
>> completely finished.
>>
>
> I'm attaching the log. Basically the storage backend was restarted by fencing
> and then this issue happened. This was on 26/02 at about 08:52 (log time).

Thank you for the log, but VMs are already “migrating” at its beginning,
there had to be some problem already earlier.

>>> I already tried restarting ovirt-engine but it didn't work.
>>
>> Here the problem is clearly on the Vdsm side.
>>
>>> Could someone shed some light on how to cancel the migration status for
>>> these
>>> machines? All of them seem to be running on the same host.
>>
>> Did the VMs get unblocked in the meantime?  I can't know what's the
>
> No, they didn't. They're still in a "Migrating" state.
>
>> actual state of the given VMs without seeing the complete Vdsm log, so
>> it's difficult to give a good advice.  I think that Vdsm restart on the
>> given host would help BUT it's generally not a very good idea to restart
>> Vdsm if any real migration, outgoing or incoming, is running on the
>> host.  VMs that aren't actually being migrated (despite being reported
>> as migrating) at all should simply return to Up state after the restart,
>> but VMs with any real migration action pending might get return to Up
>> state without proper cleanup, resulting in a different kind of mess or
>> maybe something even worse (things should improve in oVirt 4.2, but it's
>> still good to avoid Vdsm restarts with migrations running).
>>
>
> I assume this is not a real migration as it has been in this state for several
> days. Would you advice restarting vdsm in this case then?

I'd say try it.  Since nothing has changed for several days, restarting
Vdsm looks like appropriate action at this point.  Just don't make a
habit of it :-).

Regards,
Milan
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VMs stuck in migrating state

2018-03-02 Thread Milan Zamazal
nico...@devels.es writes:

> We're running 4.1.9 and during the weekend we had a storage issue that seemed
> to leave some hosts in an strange state. One of the hosts has almost all VMs
> migrating (although it seems to not actually being migrating them) and the
> migration state cannot be cancelled.
>
> When clicking on one of those machines and selecting 'Cancel migration', in 
> the
> ovirt-engine log I see:
>
> 2018-02-26 08:52:07,588Z INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.CancelMigrateVDSCommand]
> (org.ovirt.thread.pool-6-thread-36) [887dfbf9-dece-4f7b-90a8-dac02b849b7f]
> HostName = host2.domain.com
> 2018-02-26 08:52:07,588Z ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.CancelMigrateVDSCommand]
> (org.ovirt.thread.pool-6-thread-36) [887dfbf9-dece-4f7b-90a8-dac02b849b7f]
> Command 'CancelMigrateVDSCommand(HostName = host2.domain.com,
> CancelMigrationVDSParameters:{runAsync='true',
> hostId='e63b9146-10c4-47ad-bd6c-f053a8c5b4eb',
> vmId='26d37e43-32e2-4e55-9c1e-1438518d5021'})' execution failed:
> VDSGenericException: VDSErrorException: Failed to CancelMigrateVDS, error =
> Migration process cancelled, code = 82
>
> On the vdsm side I see:
>
> 2018-02-26 08:56:19,396+ INFO  (jsonrpc/0) [vdsm.api] START 
> migrateCancel()
> from=:::10.X.X.X,54654, flow_id=874d36d7-63f5-4b71-8a4d-6d9f3ec65858
> (api:46)
> 2018-02-26 08:56:19,398+ INFO  (jsonrpc/0) [vdsm.api] FINISH migrateCancel
> return={'status': {'message': 'Migration process cancelled', 'code': 82},
> 'progress': 0} from=:::10.X.X.X,54654,
> flow_id=874d36d7-63f5-4b71-8a4d-6d9f3ec65858 (api:52)
>
> So no error on the vdsm side log.

Interesting.  The messages above indicate that the VM was attempted to
migrate, but the migration got temporarily rejected on the destination
due to the number of already running incoming migrations (the limit is 2
incoming migrations by default).  Later, Vdsm was asked to cancel the
outgoing migration and it successfully set a migration canceling flag.
However the action was reported as an error to Engine, due to hitting
the incoming migration limit on the destination.  Maybe it's a bug, I'm
not sure, resulting in minor confusion.  Normally it shouldn't matter,
the migration should be canceled shortly after anyway and Engine should
be informed about that.

However the migration apparently wasn't canceled here.  I can't say what
happened without complete Vdsm log.  One of possible reasons is that the
migration has been waiting on completion of another migration outgoing
from the source (only one outgoing migration at the time is allowed by
default).  In any case it seems the migration either wasn't actually
started at all or it just started being set up and that has never been
completely finished.

> I already tried restarting ovirt-engine but it didn't work.

Here the problem is clearly on the Vdsm side.

> Could someone shed some light on how to cancel the migration status for these
> machines? All of them seem to be running on the same host.

Did the VMs get unblocked in the meantime?  I can't know what's the
actual state of the given VMs without seeing the complete Vdsm log, so
it's difficult to give a good advice.  I think that Vdsm restart on the
given host would help BUT it's generally not a very good idea to restart
Vdsm if any real migration, outgoing or incoming, is running on the
host.  VMs that aren't actually being migrated (despite being reported
as migrating) at all should simply return to Up state after the restart,
but VMs with any real migration action pending might get return to Up
state without proper cleanup, resulting in a different kind of mess or
maybe something even worse (things should improve in oVirt 4.2, but it's
still good to avoid Vdsm restarts with migrations running).

Regards,
Milan
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users