Re: [ovirt-users] Datacenter unresponsive: recovery procedure?

2015-04-16 Thread Jorick Astrego


On 04/16/2015 11:12 AM, Andrea Ghelardi wrote:

 (sorry: resending as I wasn’t part of the list, yet)

  

 hi,

 this is my first post so hallo all and thank you for reading.

 I have an issue with my production Ovirt environment (3.5.1.1-1.el6).

  

 My system consists of several datancers.

 2 of them are connected to an iSCSI SAN and they were working fine.

 Until the moment I had the bad idea of deleting a SAN volume from the
 SAN manager before deleting the associated storage on Ovirt. From that
 moment, the DC where this storage was mounted became not responsive:
 it cannot attach the master storage (or any other).

 I tried to

 1) manually destroy the offending storage (select - destroy) but
 still cannot recover the situation.

 2) right click on master storage and activate it

 3) re-initialize the datacenter using a NFS storage from the working
 sister DC.

  

 All Hosts are still running even though their status is unknown.

 All VM are still running even though their status is not responding.

  

 I half resolved the issue by manually restarting the host where the
 datastore was originally mounted. This cleared the orphaned multipath.

 However, the SPM does not come up still.

 This is an extract of the log

 /2015-04-16 03:51:48,069 WARN 
 [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand]
 (DefaultQuartzScheduler_Worker-14) [61a44b19] could not stop spm of
 pool *0002-0002-0002-0002-009c*on vds
 *89254f23-8748-402a-afc9-08438dca0975*- reason:
 org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
 VDSGenericException: VDSNetworkException: Message timeout which can be
 caused by communication issues/

 /2015-04-16 03:51:48,072 INFO 
 [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand]
 (DefaultQuartzScheduler_Worker-14) [61a44b19] FINISH,
 SpmStopVDSCommand, log id: 4354cf46/

 /2015-04-16 03:51:48,072 WARN 
 [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
 (DefaultQuartzScheduler_Worker-14) [61a44b19] spm stop on spm failed,
 stopping spm selection!/

 /2015-04-16 03:51:58,223 INFO 
 [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
 (DefaultQuartzScheduler_Worker-4) [4ca2d938] hostFromVds::selectedVds
 - Brachetto, spmStatus Free, storage pool IRDC-INTEL/

 /2015-04-16 03:51:58,225 ERROR
 [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
 (DefaultQuartzScheduler_Worker-4) [4ca2d938] SPM Init: could not find
 reported vds or not up - pool:IRDC-INTEL vds_spm_id: 3/

 /2015-04-16 03:51:58,239 INFO 
 [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
 (DefaultQuartzScheduler_Worker-4) [4ca2d938] SPM selection - vds seems
 as spm sovana/

 /2015-04-16 03:51:58,252 INFO 
 [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand]
 (DefaultQuartzScheduler_Worker-4) [4ca2d938] START,
 SpmStopVDSCommand(HostName = sovana, HostId =
 89254f23-8748-402a-afc9-08438dca0975, storagePoolId =
 0002-0002-0002-0002-009c), log id: 63a17687/

  

 storagePoolId = 0002-0002-0002-0002-009c is (was)
 hertz-dstore2 which does not exists anymore on SAN adn ovirt

 hostid  89254f23-8748-402a-afc9-08438dca0975 is sovana server (current
 SPM)

  

  

  

  

 I’m thinking about

 /Put the hosted engine host into Maintenance///

 /Shutdown Ovirt Manager///

 /Rebooted SPM server///

 /Restarted Ovirt Manager///

 /Took hosted engine host out of Maintenance///

  

  

 any help or clue is highly welcomed with cheers and beers

 thank you!




I had comparable issues nearly a year ago after a failed iSCSI failover
that ended in a split brain. Wasn't able to recover from it.

https://bugzilla.redhat.com/show_bug.cgi?id=1108576





Met vriendelijke groet, With kind regards,

Jorick Astrego

Netbulae Virtualization Experts 



Tel: 053 20 30 270  i...@netbulae.euStaalsteden 4-3A
KvK 08198180
Fax: 053 20 30 271  www.netbulae.eu 7547 TA Enschede
BTW NL821234584B01



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Datacenter unresponsive: recovery procedure?

2015-04-16 Thread Andrea Ghelardi
(sorry: resending as I wasn’t part of the list, yet)



hi,

this is my first post so hallo all and thank you for reading.

I have an issue with my production Ovirt environment (3.5.1.1-1.el6).



My system consists of several datancers.

2 of them are connected to an iSCSI SAN and they were working fine.

Until the moment I had the bad idea of deleting a SAN volume from the SAN
manager before deleting the associated storage on Ovirt. From that moment,
the DC where this storage was mounted became not responsive: it cannot
attach the master storage (or any other).

I tried to

1) manually destroy the offending storage (select - destroy) but still
cannot recover the situation.

2) right click on master storage and activate it

3) re-initialize the datacenter using a NFS storage from the working sister
DC.



All Hosts are still running even though their status is unknown.

All VM are still running even though their status is not responding.



I half resolved the issue by manually restarting the host where the
datastore was originally mounted. This cleared the orphaned multipath.

However, the SPM does not come up still.

This is an extract of the log

*2015-04-16 03:51:48,069 WARN
[org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand]
(DefaultQuartzScheduler_Worker-14) [61a44b19] could not stop spm of pool
0002-0002-0002-0002-009c on vds
89254f23-8748-402a-afc9-08438dca0975 - reason:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
VDSGenericException: VDSNetworkException: Message timeout which can be
caused by communication issues*

*2015-04-16 03:51:48,072 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand]
(DefaultQuartzScheduler_Worker-14) [61a44b19] FINISH, SpmStopVDSCommand,
log id: 4354cf46*

*2015-04-16 03:51:48,072 WARN
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
(DefaultQuartzScheduler_Worker-14) [61a44b19] spm stop on spm failed,
stopping spm selection!*

*2015-04-16 03:51:58,223 INFO
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
(DefaultQuartzScheduler_Worker-4) [4ca2d938] hostFromVds::selectedVds -
Brachetto, spmStatus Free, storage pool IRDC-INTEL*

*2015-04-16 03:51:58,225 ERROR
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
(DefaultQuartzScheduler_Worker-4) [4ca2d938] SPM Init: could not find
reported vds or not up - pool:IRDC-INTEL vds_spm_id: 3*

*2015-04-16 03:51:58,239 INFO
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
(DefaultQuartzScheduler_Worker-4) [4ca2d938] SPM selection - vds seems as
spm sovana*

*2015-04-16 03:51:58,252 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand]
(DefaultQuartzScheduler_Worker-4) [4ca2d938] START,
SpmStopVDSCommand(HostName = sovana, HostId =
89254f23-8748-402a-afc9-08438dca0975, storagePoolId =
0002-0002-0002-0002-009c), log id: 63a17687*



storagePoolId = 0002-0002-0002-0002-009c is (was) hertz-dstore2
which does not exists anymore on SAN adn ovirt

hostid  89254f23-8748-402a-afc9-08438dca0975 is sovana server (current SPM)









I’m thinking about

*Put the hosted engine host into Maintenance*

*Shutdown Ovirt Manager*

*Rebooted SPM server*

*Restarted Ovirt Manager*

*Took hosted engine host out of Maintenance*





any help or clue is highly welcomed with cheers and beers

thank you!

Andrea
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users