[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-12-04 Thread Albl, Oliver
Hi Amit,



  is it inactive, but not in maintenance mode.



Thank you,

Oliver



Von: Amit Bawer 
Gesendet: Mittwoch, 4. Dezember 2019 16:36
An: Albl, Oliver 
Cc: users@ovirt.org; Nir Soffer 
Betreff: Re: [ovirt-users] Re: Cannot activate/deactivate storage domain



in check we had here, we got similar warnings for using the ignore OVF updates 
checks, but the SD was set inactive at end of process.

what is the SD status in your case after this try?





On Wed, Dec 4, 2019 at 4:49 PM Albl, Oliver 
mailto:oliver.a...@fabasoft.com>> wrote:

   Yes.

   Am 04.12.2019 um 15:47 schrieb Amit Bawer 
mailto:aba...@redhat.com><mailto:aba...@redhat.com<mailto:aba...@redhat.com>>>:



   On Wed, Dec 4, 2019 at 4:42 PM Albl, Oliver 
mailto:oliver.a...@fabasoft.com><mailto:oliver.a...@fabasoft.com<mailto:oliver.a...@fabasoft.com>>>
 wrote:
   Hi Amit,

 unfortunately no success.

   Dec 4, 2019, 3:41:36 PM
   Storage Domain HOST_LUN_219 (Data Center xxx) was deactivated by system 
because it's not visible by any of the hosts.

   Dec 4, 2019, 3:35:09 PM
   Failed to update VMs/Templates OVF data for Storage Domain HOST_LUN_219 in 
Data Center Production.

   Dec 4, 2019, 3:35:09 PM
   Failed to update OVF disks 77c64b39-fe50-4d05-b77f-8131ad1f95f9, OVF data 
isn't updated on those OVF stores (Data Center Production, Storage Domain 
HOST_LUN_219).

   Have you selected the checkbox for "Ignore OVF update failure" before 
putting into maintenance?


   All the best,
   Oliver

   Von: Amit Bawer 
mailto:aba...@redhat.com><mailto:aba...@redhat.com<mailto:aba...@redhat.com>>>
   Gesendet: Mittwoch, 4. Dezember 2019 15:20
   An: Albl, Oliver 
mailto:oliver.a...@fabasoft.com><mailto:oliver.a...@fabasoft.com<mailto:oliver.a...@fabasoft.com>>>
   Cc: 
users@ovirt.org<mailto:users@ovirt.org><mailto:users@ovirt.org<mailto:users@ovirt.org>>;
 Nir Soffer 
mailto:nsof...@redhat.com><mailto:nsof...@redhat.com<mailto:nsof...@redhat.com>>>
   Betreff: Re: [ovirt-users] Re: Cannot activate/deactivate storage domain

   Hi Oliver,

   For deactivating the unresponsive storage domains, you can use the Compute 
-> Data Centers -> Maintenance option with "Ignore OVF update failure" checked.
   This will force deactivation of the SD.

   Will provide further details about the issue in the ticket.


   On Tue, Dec 3, 2019 at 12:02 PM Albl, Oliver 
mailto:oliver.a...@fabasoft.com><mailto:oliver.a...@fabasoft.com<mailto:oliver.a...@fabasoft.com>>>
 wrote:
   Hi,

 does anybody have an advice how to activate or safely remove that storage 
domain?

   Thank you!
   Oliver
   -Ursprüngliche Nachricht-
   Von: Oliver Albl 
mailto:oliver.a...@fabasoft.com><mailto:oliver.a...@fabasoft.com<mailto:oliver.a...@fabasoft.com>>>
   Gesendet: Dienstag, 5. November 2019 11:20
   An: 
users@ovirt.org<mailto:users@ovirt.org><mailto:users@ovirt.org<mailto:users@ovirt.org>>
   Betreff: [ovirt-users] Re: Cannot activate/deactivate storage domain

   > On Mon, Nov 4, 2019 at 9:18 PM Albl, Oliver 
http://fabasoft.com><http://fabasoft.com>> wrote:
   >
   > What was the last change in the system? upgrade? network change? storage 
change?
   >

   Last change was four weeks ago ovirt upgrade from 4.3.3 to 4.3.6.7 
(including CentOS hosts to 7.7 1908)

   >
   > This is expected if some domain is not accessible on all hosts.
   >
   >
   > This means sanlock timed out renewing the lockspace
   >
   >
   > If a host cannot access all storage domain in the DC, the system set
   > it to non-operational, and will probably try to reconnect it later.
   >
   >
   > This means reading 4k from start of the metadata lv took 9.6 seconds.
   > Something in
   > the way to storage is bad (kernel, network, storage).
   >
   >
   > We 20 seconds (4 retires, 5 seconds per retry) gracetime in multipath
   > when there are no active paths, before I/O fails, pausing the VM. We
   > also resume paused VMs when storage monitoring works again, so maybe
   > the VM were paused and resumed.
   >
   > However for storage monitoring we have strict 10 seconds timeout. If
   > reading from the metadata lv times out or fail and does not operated
   > normally after
   > 5 minutes, the
   > domain will become inactive.
   >
   >
   > This can explain the read timeouts.
   >
   >
   > This looks the right way to troubleshoot this.
   >
   >
   > We need vdsm logs to understand this failure.
   >
   >
   > This does not mean OVF is corrupted, only that we could not store new
   > data. The older data on the other OVFSTORE disk is probably fine.
   > Hopefuly the system will not try to write to the other OVFSTORE disk
   > overwriti

[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-12-04 Thread Albl, Oliver


smime.p7m
Description: S/MIME encrypted message
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4Q4ATO7KOMUF4PKRVPHA3BRMTLWSUWEU/


[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-12-04 Thread Amit Bawer
in check we had here, we got similar warnings for using the ignore OVF
updates checks, but the SD was set inactive at end of process.
what is the SD status in your case after this try?


On Wed, Dec 4, 2019 at 4:49 PM Albl, Oliver 
wrote:

> Yes.
>
> Am 04.12.2019 um 15:47 schrieb Amit Bawer  aba...@redhat.com>>:
>
>
>
> On Wed, Dec 4, 2019 at 4:42 PM Albl, Oliver  <mailto:oliver.a...@fabasoft.com>> wrote:
> Hi Amit,
>
>   unfortunately no success.
>
> Dec 4, 2019, 3:41:36 PM
> Storage Domain HOST_LUN_219 (Data Center xxx) was deactivated by system
> because it's not visible by any of the hosts.
>
> Dec 4, 2019, 3:35:09 PM
> Failed to update VMs/Templates OVF data for Storage Domain HOST_LUN_219 in
> Data Center Production.
>
> Dec 4, 2019, 3:35:09 PM
> Failed to update OVF disks 77c64b39-fe50-4d05-b77f-8131ad1f95f9, OVF data
> isn't updated on those OVF stores (Data Center Production, Storage Domain
> HOST_LUN_219).
>
> Have you selected the checkbox for "Ignore OVF update failure" before
> putting into maintenance?
>
>
> All the best,
> Oliver
>
> Von: Amit Bawer mailto:aba...@redhat.com>>
> Gesendet: Mittwoch, 4. Dezember 2019 15:20
> An: Albl, Oliver mailto:oliver.a...@fabasoft.com
> >>
> Cc: users@ovirt.org<mailto:users@ovirt.org>; Nir Soffer <
> nsof...@redhat.com<mailto:nsof...@redhat.com>>
> Betreff: Re: [ovirt-users] Re: Cannot activate/deactivate storage domain
>
> Hi Oliver,
>
> For deactivating the unresponsive storage domains, you can use the Compute
> -> Data Centers -> Maintenance option with "Ignore OVF update failure"
> checked.
> This will force deactivation of the SD.
>
> Will provide further details about the issue in the ticket.
>
>
> On Tue, Dec 3, 2019 at 12:02 PM Albl, Oliver  <mailto:oliver.a...@fabasoft.com>> wrote:
> Hi,
>
>   does anybody have an advice how to activate or safely remove that
> storage domain?
>
> Thank you!
> Oliver
> -Ursprüngliche Nachricht-
> Von: Oliver Albl mailto:oliver.a...@fabasoft.com
> >>
> Gesendet: Dienstag, 5. November 2019 11:20
> An: users@ovirt.org<mailto:users@ovirt.org>
> Betreff: [ovirt-users] Re: Cannot activate/deactivate storage domain
>
> > On Mon, Nov 4, 2019 at 9:18 PM Albl, Oliver  http://fabasoft.com>> wrote:
> >
> > What was the last change in the system? upgrade? network change? storage
> change?
> >
>
> Last change was four weeks ago ovirt upgrade from 4.3.3 to 4.3.6.7
> (including CentOS hosts to 7.7 1908)
>
> >
> > This is expected if some domain is not accessible on all hosts.
> >
> >
> > This means sanlock timed out renewing the lockspace
> >
> >
> > If a host cannot access all storage domain in the DC, the system set
> > it to non-operational, and will probably try to reconnect it later.
> >
> >
> > This means reading 4k from start of the metadata lv took 9.6 seconds.
> > Something in
> > the way to storage is bad (kernel, network, storage).
> >
> >
> > We 20 seconds (4 retires, 5 seconds per retry) gracetime in multipath
> > when there are no active paths, before I/O fails, pausing the VM. We
> > also resume paused VMs when storage monitoring works again, so maybe
> > the VM were paused and resumed.
> >
> > However for storage monitoring we have strict 10 seconds timeout. If
> > reading from the metadata lv times out or fail and does not operated
> > normally after
> > 5 minutes, the
> > domain will become inactive.
> >
> >
> > This can explain the read timeouts.
> >
> >
> > This looks the right way to troubleshoot this.
> >
> >
> > We need vdsm logs to understand this failure.
> >
> >
> > This does not mean OVF is corrupted, only that we could not store new
> > data. The older data on the other OVFSTORE disk is probably fine.
> > Hopefuly the system will not try to write to the other OVFSTORE disk
> > overwriting the last good version.
> >
> >
> > This is normal, the first 2048 bytes are always zeroes. This area was
> > used for domain metadata in older versions.
> >
> >
> > Please share more details:
> >
> > - output of "lsblk"
> > - output of "multipath -ll"
> > - output of "/usr/libexec/vdsm/fc-scan -v"
> > - output of "vgs -o +tags problem-domain-id"
> > - output of "lvs -o +tags problem-domain-id"
> > - contents of /etc/multipath.conf
> > - contents of /etc/multipath.conf.d/*.conf
> > - /var/l

[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-12-04 Thread Albl, Oliver
Yes.

Am 04.12.2019 um 15:47 schrieb Amit Bawer 
mailto:aba...@redhat.com>>:



On Wed, Dec 4, 2019 at 4:42 PM Albl, Oliver 
mailto:oliver.a...@fabasoft.com>> wrote:
Hi Amit,

  unfortunately no success.

Dec 4, 2019, 3:41:36 PM
Storage Domain HOST_LUN_219 (Data Center xxx) was deactivated by system because 
it's not visible by any of the hosts.

Dec 4, 2019, 3:35:09 PM
Failed to update VMs/Templates OVF data for Storage Domain HOST_LUN_219 in Data 
Center Production.

Dec 4, 2019, 3:35:09 PM
Failed to update OVF disks 77c64b39-fe50-4d05-b77f-8131ad1f95f9, OVF data isn't 
updated on those OVF stores (Data Center Production, Storage Domain 
HOST_LUN_219).

Have you selected the checkbox for "Ignore OVF update failure" before putting 
into maintenance?


All the best,
Oliver

Von: Amit Bawer mailto:aba...@redhat.com>>
Gesendet: Mittwoch, 4. Dezember 2019 15:20
An: Albl, Oliver mailto:oliver.a...@fabasoft.com>>
Cc: users@ovirt.org<mailto:users@ovirt.org>; Nir Soffer 
mailto:nsof...@redhat.com>>
Betreff: Re: [ovirt-users] Re: Cannot activate/deactivate storage domain

Hi Oliver,

For deactivating the unresponsive storage domains, you can use the Compute -> 
Data Centers -> Maintenance option with "Ignore OVF update failure" checked.
This will force deactivation of the SD.

Will provide further details about the issue in the ticket.


On Tue, Dec 3, 2019 at 12:02 PM Albl, Oliver 
mailto:oliver.a...@fabasoft.com>> wrote:
Hi,

  does anybody have an advice how to activate or safely remove that storage 
domain?

Thank you!
Oliver
-Ursprüngliche Nachricht-
Von: Oliver Albl mailto:oliver.a...@fabasoft.com>>
Gesendet: Dienstag, 5. November 2019 11:20
An: users@ovirt.org<mailto:users@ovirt.org>
Betreff: [ovirt-users] Re: Cannot activate/deactivate storage domain

> On Mon, Nov 4, 2019 at 9:18 PM Albl, Oliver 
> http://fabasoft.com>> wrote:
>
> What was the last change in the system? upgrade? network change? storage 
> change?
>

Last change was four weeks ago ovirt upgrade from 4.3.3 to 4.3.6.7 (including 
CentOS hosts to 7.7 1908)

>
> This is expected if some domain is not accessible on all hosts.
>
>
> This means sanlock timed out renewing the lockspace
>
>
> If a host cannot access all storage domain in the DC, the system set
> it to non-operational, and will probably try to reconnect it later.
>
>
> This means reading 4k from start of the metadata lv took 9.6 seconds.
> Something in
> the way to storage is bad (kernel, network, storage).
>
>
> We 20 seconds (4 retires, 5 seconds per retry) gracetime in multipath
> when there are no active paths, before I/O fails, pausing the VM. We
> also resume paused VMs when storage monitoring works again, so maybe
> the VM were paused and resumed.
>
> However for storage monitoring we have strict 10 seconds timeout. If
> reading from the metadata lv times out or fail and does not operated
> normally after
> 5 minutes, the
> domain will become inactive.
>
>
> This can explain the read timeouts.
>
>
> This looks the right way to troubleshoot this.
>
>
> We need vdsm logs to understand this failure.
>
>
> This does not mean OVF is corrupted, only that we could not store new
> data. The older data on the other OVFSTORE disk is probably fine.
> Hopefuly the system will not try to write to the other OVFSTORE disk
> overwriting the last good version.
>
>
> This is normal, the first 2048 bytes are always zeroes. This area was
> used for domain metadata in older versions.
>
>
> Please share more details:
>
> - output of "lsblk"
> - output of "multipath -ll"
> - output of "/usr/libexec/vdsm/fc-scan -v"
> - output of "vgs -o +tags problem-domain-id"
> - output of "lvs -o +tags problem-domain-id"
> - contents of /etc/multipath.conf
> - contents of /etc/multipath.conf.d/*.conf
> - /var/log/messages since the issue started
> - /var/log/vdsm/vdsm.log* since the issue started on one of the hosts
>
> A bug is probably the best place to keep these logs and make it easy to trac.

Please see https://bugzilla.redhat.com/show_bug.cgi?id=1768821

>
> Thanks,
> Nir

Thank you!
Oliver
___
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org> Privacy Statement: 
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QZ5ZN2S7N54JYVV3RWOYOHTEAWFQ23Q7/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/H5MDS2RZXPE65CMQEOF6WN7ZVWGCDETO/


[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-12-04 Thread Albl, Oliver
Hi Amit,



  unfortunately no success.



Dec 4, 2019, 3:41:36 PM

Storage Domain HOST_LUN_219 (Data Center xxx) was deactivated by system because 
it's not visible by any of the hosts.



Dec 4, 2019, 3:35:09 PM

Failed to update VMs/Templates OVF data for Storage Domain HOST_LUN_219 in Data 
Center Production.



Dec 4, 2019, 3:35:09 PM

Failed to update OVF disks 77c64b39-fe50-4d05-b77f-8131ad1f95f9, OVF data isn't 
updated on those OVF stores (Data Center Production, Storage Domain 
HOST_LUN_219).



All the best,

Oliver



Von: Amit Bawer 
Gesendet: Mittwoch, 4. Dezember 2019 15:20
An: Albl, Oliver 
Cc: users@ovirt.org; Nir Soffer 
Betreff: Re: [ovirt-users] Re: Cannot activate/deactivate storage domain



Hi Oliver,



For deactivating the unresponsive storage domains, you can use the Compute -> 
Data Centers -> Maintenance option with "Ignore OVF update failure" checked.

This will force deactivation of the SD.



Will provide further details about the issue in the ticket.





On Tue, Dec 3, 2019 at 12:02 PM Albl, Oliver 
mailto:oliver.a...@fabasoft.com>> wrote:

   Hi,

 does anybody have an advice how to activate or safely remove that storage 
domain?

   Thank you!
   Oliver
   -Ursprüngliche Nachricht-
   Von: Oliver Albl mailto:oliver.a...@fabasoft.com>>
   Gesendet: Dienstag, 5. November 2019 11:20
   An: users@ovirt.org<mailto:users@ovirt.org>
   Betreff: [ovirt-users] Re: Cannot activate/deactivate storage domain

   > On Mon, Nov 4, 2019 at 9:18 PM Albl, Oliver 
http://fabasoft.com>> wrote:
   >
   > What was the last change in the system? upgrade? network change? storage 
change?
   >

   Last change was four weeks ago ovirt upgrade from 4.3.3 to 4.3.6.7 
(including CentOS hosts to 7.7 1908)

   >
   > This is expected if some domain is not accessible on all hosts.
   >
   >
   > This means sanlock timed out renewing the lockspace
   >
   >
   > If a host cannot access all storage domain in the DC, the system set
   > it to non-operational, and will probably try to reconnect it later.
   >
   >
   > This means reading 4k from start of the metadata lv took 9.6 seconds.
   > Something in
   > the way to storage is bad (kernel, network, storage).
   >
   >
   > We 20 seconds (4 retires, 5 seconds per retry) gracetime in multipath
   > when there are no active paths, before I/O fails, pausing the VM. We
   > also resume paused VMs when storage monitoring works again, so maybe
   > the VM were paused and resumed.
   >
   > However for storage monitoring we have strict 10 seconds timeout. If
   > reading from the metadata lv times out or fail and does not operated
   > normally after
   > 5 minutes, the
   > domain will become inactive.
   >
   >
   > This can explain the read timeouts.
   >
   >
   > This looks the right way to troubleshoot this.
   >
   >
   > We need vdsm logs to understand this failure.
   >
   >
   > This does not mean OVF is corrupted, only that we could not store new
   > data. The older data on the other OVFSTORE disk is probably fine.
   > Hopefuly the system will not try to write to the other OVFSTORE disk
   > overwriting the last good version.
   >
   >
   > This is normal, the first 2048 bytes are always zeroes. This area was
   > used for domain metadata in older versions.
   >
   >
   > Please share more details:
   >
   > - output of "lsblk"
   > - output of "multipath -ll"
   > - output of "/usr/libexec/vdsm/fc-scan -v"
   > - output of "vgs -o +tags problem-domain-id"
   > - output of "lvs -o +tags problem-domain-id"
   > - contents of /etc/multipath.conf
   > - contents of /etc/multipath.conf.d/*.conf
   > - /var/log/messages since the issue started
   > - /var/log/vdsm/vdsm.log* since the issue started on one of the hosts
   >
   > A bug is probably the best place to keep these logs and make it easy to 
trac.

   Please see https://bugzilla.redhat.com/show_bug.cgi?id=1768821

   >
   > Thanks,
   > Nir

   Thank you!
   Oliver
   ___
   Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
   To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org> Privacy Statement: 
https://www.ovirt.org/site/privacy-policy/
   oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
   List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QZ5ZN2S7N54JYVV3RWOYOHTEAWFQ23Q7/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UVYHPVKPV5575BQ4XUYOFGZV4KZ2IF2H/


[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-12-04 Thread Amit Bawer
On Wed, Dec 4, 2019 at 4:42 PM Albl, Oliver 
wrote:

> Hi Amit,
>
>
>
>   unfortunately no success.
>
>
>
> Dec 4, 2019, 3:41:36 PM
>
> Storage Domain HOST_LUN_219 (Data Center xxx) was deactivated by system
> because it's not visible by any of the hosts.
>
>
>
> Dec 4, 2019, 3:35:09 PM
>
> Failed to update VMs/Templates OVF data for Storage Domain HOST_LUN_219 in
> Data Center Production.
>
>
>
> Dec 4, 2019, 3:35:09 PM
>
> Failed to update OVF disks 77c64b39-fe50-4d05-b77f-8131ad1f95f9, OVF data
> isn't updated on those OVF stores (Data Center Production, Storage Domain
> HOST_LUN_219).
>

Have you selected the checkbox for "Ignore OVF update failure" before
putting into maintenance?


>
> All the best,
>
> Oliver
>
>
>
> *Von:* Amit Bawer 
> *Gesendet:* Mittwoch, 4. Dezember 2019 15:20
> *An:* Albl, Oliver 
> *Cc:* users@ovirt.org; Nir Soffer 
> *Betreff:* Re: [ovirt-users] Re: Cannot activate/deactivate storage domain
>
>
>
> Hi Oliver,
>
>
>
> For deactivating the unresponsive storage domains, you can use the Compute
> -> Data Centers -> Maintenance option with "Ignore OVF update failure"
> checked.
>
> This will force deactivation of the SD.
>
>
>
> Will provide further details about the issue in the ticket.
>
>
>
>
>
> On Tue, Dec 3, 2019 at 12:02 PM Albl, Oliver 
> wrote:
>
> Hi,
>
>   does anybody have an advice how to activate or safely remove that
> storage domain?
>
> Thank you!
> Oliver
> -Ursprüngliche Nachricht-
> Von: Oliver Albl 
> Gesendet: Dienstag, 5. November 2019 11:20
> An: users@ovirt.org
> Betreff: [ovirt-users] Re: Cannot activate/deactivate storage domain
>
> > On Mon, Nov 4, 2019 at 9:18 PM Albl, Oliver  wrote:
> >
> > What was the last change in the system? upgrade? network change? storage
> change?
> >
>
> Last change was four weeks ago ovirt upgrade from 4.3.3 to 4.3.6.7
> (including CentOS hosts to 7.7 1908)
>
> >
> > This is expected if some domain is not accessible on all hosts.
> >
> >
> > This means sanlock timed out renewing the lockspace
> >
> >
> > If a host cannot access all storage domain in the DC, the system set
> > it to non-operational, and will probably try to reconnect it later.
> >
> >
> > This means reading 4k from start of the metadata lv took 9.6 seconds.
> > Something in
> > the way to storage is bad (kernel, network, storage).
> >
> >
> > We 20 seconds (4 retires, 5 seconds per retry) gracetime in multipath
> > when there are no active paths, before I/O fails, pausing the VM. We
> > also resume paused VMs when storage monitoring works again, so maybe
> > the VM were paused and resumed.
> >
> > However for storage monitoring we have strict 10 seconds timeout. If
> > reading from the metadata lv times out or fail and does not operated
> > normally after
> > 5 minutes, the
> > domain will become inactive.
> >
> >
> > This can explain the read timeouts.
> >
> >
> > This looks the right way to troubleshoot this.
> >
> >
> > We need vdsm logs to understand this failure.
> >
> >
> > This does not mean OVF is corrupted, only that we could not store new
> > data. The older data on the other OVFSTORE disk is probably fine.
> > Hopefuly the system will not try to write to the other OVFSTORE disk
> > overwriting the last good version.
> >
> >
> > This is normal, the first 2048 bytes are always zeroes. This area was
> > used for domain metadata in older versions.
> >
> >
> > Please share more details:
> >
> > - output of "lsblk"
> > - output of "multipath -ll"
> > - output of "/usr/libexec/vdsm/fc-scan -v"
> > - output of "vgs -o +tags problem-domain-id"
> > - output of "lvs -o +tags problem-domain-id"
> > - contents of /etc/multipath.conf
> > - contents of /etc/multipath.conf.d/*.conf
> > - /var/log/messages since the issue started
> > - /var/log/vdsm/vdsm.log* since the issue started on one of the hosts
> >
> > A bug is probably the best place to keep these logs and make it easy to
> trac.
>
> Please see https://bugzilla.redhat.com/show_bug.cgi?id=1768821
>
> >
> > Thanks,
> > Nir
>
> Thank you!
> Oliver
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org Privacy Statement:
> https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/QZ5ZN2S7N54JYVV3RWOYOHTEAWFQ23Q7/
>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/AF2GBIQKW45QVGJCEN2O3ZYV2BVTI4YU/


[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-12-04 Thread Albl, Oliver


smime.p7m
Description: S/MIME encrypted message
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BWQ6JWCEA2SCQX4YSL3Y5Z5IHONQ7ZH3/


[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-12-04 Thread Amit Bawer
Hi Oliver,

For deactivating the unresponsive storage domains, you can use the Compute
-> Data Centers -> Maintenance option with "Ignore OVF update failure"
checked.
This will force deactivation of the SD.

Will provide further details about the issue in the ticket.


On Tue, Dec 3, 2019 at 12:02 PM Albl, Oliver 
wrote:

> Hi,
>
>   does anybody have an advice how to activate or safely remove that
> storage domain?
>
> Thank you!
> Oliver
> -Ursprüngliche Nachricht-
> Von: Oliver Albl 
> Gesendet: Dienstag, 5. November 2019 11:20
> An: users@ovirt.org
> Betreff: [ovirt-users] Re: Cannot activate/deactivate storage domain
>
> > On Mon, Nov 4, 2019 at 9:18 PM Albl, Oliver  wrote:
> >
> > What was the last change in the system? upgrade? network change? storage
> change?
> >
>
> Last change was four weeks ago ovirt upgrade from 4.3.3 to 4.3.6.7
> (including CentOS hosts to 7.7 1908)
>
> >
> > This is expected if some domain is not accessible on all hosts.
> >
> >
> > This means sanlock timed out renewing the lockspace
> >
> >
> > If a host cannot access all storage domain in the DC, the system set
> > it to non-operational, and will probably try to reconnect it later.
> >
> >
> > This means reading 4k from start of the metadata lv took 9.6 seconds.
> > Something in
> > the way to storage is bad (kernel, network, storage).
> >
> >
> > We 20 seconds (4 retires, 5 seconds per retry) gracetime in multipath
> > when there are no active paths, before I/O fails, pausing the VM. We
> > also resume paused VMs when storage monitoring works again, so maybe
> > the VM were paused and resumed.
> >
> > However for storage monitoring we have strict 10 seconds timeout. If
> > reading from the metadata lv times out or fail and does not operated
> > normally after
> > 5 minutes, the
> > domain will become inactive.
> >
> >
> > This can explain the read timeouts.
> >
> >
> > This looks the right way to troubleshoot this.
> >
> >
> > We need vdsm logs to understand this failure.
> >
> >
> > This does not mean OVF is corrupted, only that we could not store new
> > data. The older data on the other OVFSTORE disk is probably fine.
> > Hopefuly the system will not try to write to the other OVFSTORE disk
> > overwriting the last good version.
> >
> >
> > This is normal, the first 2048 bytes are always zeroes. This area was
> > used for domain metadata in older versions.
> >
> >
> > Please share more details:
> >
> > - output of "lsblk"
> > - output of "multipath -ll"
> > - output of "/usr/libexec/vdsm/fc-scan -v"
> > - output of "vgs -o +tags problem-domain-id"
> > - output of "lvs -o +tags problem-domain-id"
> > - contents of /etc/multipath.conf
> > - contents of /etc/multipath.conf.d/*.conf
> > - /var/log/messages since the issue started
> > - /var/log/vdsm/vdsm.log* since the issue started on one of the hosts
> >
> > A bug is probably the best place to keep these logs and make it easy to
> trac.
>
> Please see https://bugzilla.redhat.com/show_bug.cgi?id=1768821
>
> >
> > Thanks,
> > Nir
>
> Thank you!
> Oliver
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org Privacy Statement:
> https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/QZ5ZN2S7N54JYVV3RWOYOHTEAWFQ23Q7/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/E7AMRZVLGZALEKSWOG2SWMSYQNDNHTOU/


[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-12-03 Thread Albl, Oliver
Hi,

  does anybody have an advice how to activate or safely remove that storage 
domain? 

Thank you!
Oliver
-Ursprüngliche Nachricht-
Von: Oliver Albl  
Gesendet: Dienstag, 5. November 2019 11:20
An: users@ovirt.org
Betreff: [ovirt-users] Re: Cannot activate/deactivate storage domain

> On Mon, Nov 4, 2019 at 9:18 PM Albl, Oliver  wrote:
> 
> What was the last change in the system? upgrade? network change? storage 
> change?
> 

Last change was four weeks ago ovirt upgrade from 4.3.3 to 4.3.6.7 (including 
CentOS hosts to 7.7 1908)

> 
> This is expected if some domain is not accessible on all hosts.
> 
> 
> This means sanlock timed out renewing the lockspace
> 
> 
> If a host cannot access all storage domain in the DC, the system set 
> it to non-operational, and will probably try to reconnect it later.
> 
> 
> This means reading 4k from start of the metadata lv took 9.6 seconds.
> Something in
> the way to storage is bad (kernel, network, storage).
> 
> 
> We 20 seconds (4 retires, 5 seconds per retry) gracetime in multipath 
> when there are no active paths, before I/O fails, pausing the VM. We 
> also resume paused VMs when storage monitoring works again, so maybe 
> the VM were paused and resumed.
> 
> However for storage monitoring we have strict 10 seconds timeout. If 
> reading from the metadata lv times out or fail and does not operated 
> normally after
> 5 minutes, the
> domain will become inactive.
> 
> 
> This can explain the read timeouts.
> 
> 
> This looks the right way to troubleshoot this.
> 
> 
> We need vdsm logs to understand this failure.
> 
> 
> This does not mean OVF is corrupted, only that we could not store new 
> data. The older data on the other OVFSTORE disk is probably fine. 
> Hopefuly the system will not try to write to the other OVFSTORE disk 
> overwriting the last good version.
> 
> 
> This is normal, the first 2048 bytes are always zeroes. This area was 
> used for domain metadata in older versions.
> 
> 
> Please share more details:
> 
> - output of "lsblk"
> - output of "multipath -ll"
> - output of "/usr/libexec/vdsm/fc-scan -v"
> - output of "vgs -o +tags problem-domain-id"
> - output of "lvs -o +tags problem-domain-id"
> - contents of /etc/multipath.conf
> - contents of /etc/multipath.conf.d/*.conf
> - /var/log/messages since the issue started
> - /var/log/vdsm/vdsm.log* since the issue started on one of the hosts
> 
> A bug is probably the best place to keep these logs and make it easy to trac.

Please see https://bugzilla.redhat.com/show_bug.cgi?id=1768821

> 
> Thanks,
> Nir

Thank you!
Oliver
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: 
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QZ5ZN2S7N54JYVV3RWOYOHTEAWFQ23Q7/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PQOG4UN7RCUMI2XGURKAQYOJ6Y72MEWH/


[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-11-28 Thread Albl, Oliver
Hi,

  any ideas if or how I can recover the storage domain? I will need to destroy 
it, as the ongoing scsi scans are becoming an impediment.

Thank you and all the best,
Oliver

-Ursprüngliche Nachricht-
Von: Oliver Albl 
Gesendet: Dienstag, 5. November 2019 11:20
An: users@ovirt.org
Betreff: [ovirt-users] Re: Cannot activate/deactivate storage domain

> On Mon, Nov 4, 2019 at 9:18 PM Albl, Oliver  wrote:
>
> What was the last change in the system? upgrade? network change? storage
> change?
>

Last change was four weeks ago ovirt upgrade from 4.3.3 to 4.3.6.7 (including
CentOS hosts to 7.7 1908)

>
> This is expected if some domain is not accessible on all hosts.
>
>
> This means sanlock timed out renewing the lockspace
>
>
> If a host cannot access all storage domain in the DC, the system set
> it to non-operational, and will probably try to reconnect it later.
>
>
> This means reading 4k from start of the metadata lv took 9.6 seconds.
> Something in
> the way to storage is bad (kernel, network, storage).
>
>
> We 20 seconds (4 retires, 5 seconds per retry) gracetime in multipath
> when there are no active paths, before I/O fails, pausing the VM. We
> also resume paused VMs when storage monitoring works again, so maybe
> the VM were paused and resumed.
>
> However for storage monitoring we have strict 10 seconds timeout. If
> reading from the metadata lv times out or fail and does not operated
> normally after
> 5 minutes, the
> domain will become inactive.
>
>
> This can explain the read timeouts.
>
>
> This looks the right way to troubleshoot this.
>
>
> We need vdsm logs to understand this failure.
>
>
> This does not mean OVF is corrupted, only that we could not store new
> data. The older data on the other OVFSTORE disk is probably fine.
> Hopefuly the system will not try to write to the other OVFSTORE disk
> overwriting the last good version.
>
>
> This is normal, the first 2048 bytes are always zeroes. This area was
> used for domain metadata in older versions.
>
>
> Please share more details:
>
> - output of "lsblk"
> - output of "multipath -ll"
> - output of "/usr/libexec/vdsm/fc-scan -v"
> - output of "vgs -o +tags problem-domain-id"
> - output of "lvs -o +tags problem-domain-id"
> - contents of /etc/multipath.conf
> - contents of /etc/multipath.conf.d/*.conf
> - /var/log/messages since the issue started
> - /var/log/vdsm/vdsm.log* since the issue started on one of the hosts
>
> A bug is probably the best place to keep these logs and make it easy to
> trac.

Please see https://bugzilla.redhat.com/show_bug.cgi?id=1768821

>
> Thanks,
> Nir

Thank you!
Oliver
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QZ5ZN2S7N54JYVV3RWOYOHTEAWFQ23Q7/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZF2JJRFTP43XZNLFYXQIAOJKVDGYKAHL/


[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-11-28 Thread Albl, Oliver


smime.p7m
Description: S/MIME encrypted message
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GSBKAWBDS432YCFCI76AOE4TKZDK72F6/


[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-11-05 Thread Oliver Albl
> On Mon, Nov 4, 2019 at 9:18 PM Albl, Oliver  wrote:
> 
> What was the last change in the system? upgrade? network change? storage 
> change?
> 

Last change was four weeks ago ovirt upgrade from 4.3.3 to 4.3.6.7 (including 
CentOS hosts to 7.7 1908)

> 
> This is expected if some domain is not accessible on all hosts.
> 
> 
> This means sanlock timed out renewing the lockspace
> 
> 
> If a host cannot access all storage domain in the DC, the system set
> it to non-operational, and will
> probably try to reconnect it later.
> 
> 
> This means reading 4k from start of the metadata lv took 9.6 seconds.
> Something in
> the way to storage is bad (kernel, network, storage).
> 
> 
> We 20 seconds (4 retires, 5 seconds per retry) gracetime in multipath
> when there are
> no active paths, before I/O fails, pausing the VM. We also resume
> paused VMs when
> storage monitoring works again, so maybe the VM were paused and resumed.
> 
> However for storage monitoring we have strict 10 seconds timeout. If
> reading from
> the metadata lv times out or fail and does not operated normally after
> 5 minutes, the
> domain will become inactive.
> 
> 
> This can explain the read timeouts.
> 
> 
> This looks the right way to troubleshoot this.
> 
> 
> We need vdsm logs to understand this failure.
> 
> 
> This does not mean OVF is corrupted, only that we could not store new
> data. The older data on the other
> OVFSTORE disk is probably fine. Hopefuly the system will not try to
> write to the other OVFSTORE disk
> overwriting the last good version.
> 
> 
> This is normal, the first 2048 bytes are always zeroes. This area was
> used for domain
> metadata in older versions.
> 
> 
> Please share more details:
> 
> - output of "lsblk"
> - output of "multipath -ll"
> - output of "/usr/libexec/vdsm/fc-scan -v"
> - output of "vgs -o +tags problem-domain-id"
> - output of "lvs -o +tags problem-domain-id"
> - contents of /etc/multipath.conf
> - contents of /etc/multipath.conf.d/*.conf
> - /var/log/messages since the issue started
> - /var/log/vdsm/vdsm.log* since the issue started on one of the hosts
> 
> A bug is probably the best place to keep these logs and make it easy to trac.

Please see https://bugzilla.redhat.com/show_bug.cgi?id=1768821

> 
> Thanks,
> Nir

Thank you!
Oliver
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QZ5ZN2S7N54JYVV3RWOYOHTEAWFQ23Q7/


[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-11-05 Thread Nir Soffer
On Mon, Nov 4, 2019 at 9:18 PM Albl, Oliver  wrote:
>
> Hi all,
>   I run an oVirt 4.3.6.7-1.el7 installation (50+ hosts, 40+ FC storage 
> domains on two all-flash arrays) and experienced a problem accessing single 
> storage domains.

What was the last change in the system? upgrade? network change? storage change?

> As a result, hosts were taken “not operational” because they could not see 
> all storage domains, SPM started to move around the hosts.

This is expected if some domain is not accessible on all hosts.

> oVirt messages start with:
>
> 2019-11-04 15:10:22.739+01 | VDSM HOST082 command SpmStatusVDS failed: (-202, 
> 'Sanlock resource read failure', 'IO timeout')

This means sanlock timed out renewing the lockspace

> 2019-11-04 15:13:58.836+01 | Host HOST017 cannot access the Storage Domain(s) 
> HOST_LUN_204 attached to the Data Center . Setting Host state to 
> Non-Operational.

If a host cannot access all storage domain in the DC, the system set
it to non-operational, and will
probably try to reconnect it later.

> 2019-11-04 15:15:14.145+01 | Storage domain HOST_LUN_221 experienced a high 
> latency of 9.60953 seconds from host HOST038. This may cause performance and 
> functional issues. Please consult your Storage Administrator.

This means reading 4k from start of the metadata lv took 9.6 seconds.
Something in
the way to storage is bad (kernel, network, storage).

> The problem mainly affected two storage domains (on the same array) but I 
> also saw single messages for other storage domains (one the other array as 
> well).
> Storage domains stayed available to the hosts, all VMs continued to run.

We 20 seconds (4 retires, 5 seconds per retry) gracetime in multipath
when there are
no active paths, before I/O fails, pausing the VM. We also resume
paused VMs when
storage monitoring works again, so maybe the VM were paused and resumed.

However for storage monitoring we have strict 10 seconds timeout. If
reading from
the metadata lv times out or fail and does not operated normally after
5 minutes, the
domain will become inactive.

> When constantly reading from the storage domains (/bin/dd iflag=direct 
> if=  bs=4096 count=1 of=/dev/null) we got expected 20+ MBytes/s on 
> all but some storage domains. One of them showed “transfer rates” around 200 
> Bytes/s, but went up to normal performance from time to time. Transfer rate 
> to this domain was different between the hosts.

This can explain the read timeouts.

> /var/log/messages contain qla2xxx abort messages on almost all hosts. There 
> are no errors on SAN switches or storage array (but vendor is still 
> investigating). I did not see high load on the storage array.
> The system seemed to stabilize when I stopped all VMs on the affected storage 
> domain and this storage domain became “inactive”.

This looks the right way to troubleshoot this.

> Currently, this storage domain still is inactive and we cannot place it in 
> maintenance mode (“Failed to deactivate Storage Domain”) nor activate it.

We need vdsm logs to understand this failure.

> OVF Metadata seems to be corrupt as well (failed to update OVF disks , 
> OVF data isn't updated on those OVF stores).

This does not mean OVF is corrupted, only that we could not store new
data. The older data on the other
OVFSTORE disk is probably fine. Hopefuly the system will not try to
write to the other OVFSTORE disk
overwriting the last good version.

> The first six 512 byte blocks of /dev//metadata seem to contain only 
> zeros.

This is normal, the first 2048 bytes are always zeroes. This area was
used for domain
metadata in older versions.

> Any advice on how to proceed here?
>
> Is there a way to recover this storage domain?

Please share more details:

- output of "lsblk"
- output of "multipath -ll"
- output of "/usr/libexec/vdsm/fc-scan -v"
- output of "vgs -o +tags problem-domain-id"
- output of "lvs -o +tags problem-domain-id"
- contents of /etc/multipath.conf
- contents of /etc/multipath.conf.d/*.conf
- /var/log/messages since the issue started
- /var/log/vdsm/vdsm.log* since the issue started on one of the hosts

A bug is probably the best place to keep these logs and make it easy to trac.

Thanks,
Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/CLVTY3WNCTYDT2P4PQWQBXVCBTB5DCGX/