[ovirt-users] Re: oVirt Nodes 'Setting Host state to Non-Operational' - looking for the cause.

2022-03-17 Thread Strahil Nikolov via Users
If there is a bug in gluster , gfid split brain is still possible.Check the 
file attributes of the affected file.Stale file handle can be identified in 
FUSE mount point /rhev/long/path/to/storage/mounted by missing user group and 
size (all visible as  ??? ).
Best Regards,Strahil Nikolov
 
 
Thanks Strahil,

The Environment is as follows:

oVirt Open Virtualization Manager:
Software Version:4.4.9.5-1.el8

oVirt Node:
OS Version: RHEL - 8.4.2105.0 - 3.el8
OS Description: oVirt Node 4.4.6
GlusterFS Version: glusterfs-8.5-1.el8

The Volumes are Arbiter (2+1) volumes so split brain should not be an issue.

Regards

Simon...
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2AKCCVWH7GRLAVISA2KQAXSMTKTVNVX4/
  
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/U4HJHZZUVTBRRVGRZPPXACIEAHEX2XJ7/


[ovirt-users] Re: oVirt Nodes 'Setting Host state to Non-Operational' - looking for the cause.

2022-03-16 Thread simon
Thanks Strahil,

The Environment is as follows:

oVirt Open Virtualization Manager:
Software Version:4.4.9.5-1.el8

oVirt Node:
OS Version: RHEL - 8.4.2105.0 - 3.el8
OS Description: oVirt Node 4.4.6
GlusterFS Version: glusterfs-8.5-1.el8

The Volumes are Arbiter (2+1) volumes so split brain should not be an issue.

Regards

Simon...
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2AKCCVWH7GRLAVISA2KQAXSMTKTVNVX4/


[ovirt-users] Re: oVirt Nodes 'Setting Host state to Non-Operational' - looking for the cause.

2022-03-16 Thread Strahil Nikolov via Users
Stale file handle is an indication of a split brain situation. On a 3-way 
replica, this could only mean gfid mismatch (gfid is unique id for each file in 
gluster).
I think those .prob can be deleted safely, but I am not fully convinced.
What version of oVirt are you using ? What about gluster version ?
Best Regards,Strahil Nikolov
 
 
2 days ago I found that 2 of the 3 oVirt nodes had been set to 
'Non-Operational'. GlusterFS seemed to be ok from the commandline, but the 
oVirt engine WebUI was reporting 2 out of 3 bricks per volume as down and event 
logs were filling up with the following types of messages.


Failed to connect Host ddmovirtprod03 to the Storage Domains data03.
The error message for connection ddmovirtprod03-strg:/data03 returned by VDSM 
was: Problem while trying to mount target
Failed to connect Host ddmovirtprod03 to Storage Serverthe s
Host ddmovirtprod03 cannot access the Storage Domain(s) data03 attached to the 
Data Center DDM_Production_DC. Setting Host state to Non-Operational.
Failed to connect Host ddmovirtprod03 to Storage Pool 


Host ddmovirtprod01 reports about one of the Active Storage Domains as 
Problematic.
Host ddmovirtprod01 cannot access the Storage Domain(s) data03 attached to the 
Data Center DDM_Production_DC. Setting Host state to Non-Operational.
Failed to connect Host ddmovirtprod01 to Storage Pool DDM_Production_DC


The following is from the vdsm.log on host01:

[root@ddmovirtprod01 vdsm]# tail -f /var/log/vdsm/vdsm.log | grep "WARN"
2022-03-15 11:37:14,299+ WARN (ioprocess/232748) [IOProcess] 
(6bf1ef03-77e1-423b-850e-9bb6030b590d) Failed to create a probe file: 
'/rhev/data-center/mnt/glusterSD/ddmovirtprod03-strg:data03/.prob-6c101766-4e5d-40c6-8fa8-0f7e3b3e931e',
 error: 'Stale file handle' (init:461)
2022-03-15 11:37:24,313+ WARN (ioprocess/232748) [IOProcess] 
(6bf1ef03-77e1-423b-850e-9bb6030b590d) Failed to create a probe file: 
'/rhev/data-center/mnt/glusterSD/ddmovirtprod03-strg:_data03/.prob-c3fa017b-94dc-47d1-89a4-8ee046509a32',
 error: 'Stale file handle' (init:461)
2022-03-15 11:37:34,325+ WARN (ioprocess/232748) [IOProcess] 
(6bf1ef03-77e1-423b-850e-9bb6030b590d) Failed to create a probe file: 
'/rhev/data-center/mnt/glusterSD/ddmovirtprod03-strg:_data03/.prob-e173ecac-4d4d-4b59-a437-61eb5d0beb83',
 error: 'Stale file handle' (init:461)
2022-03-15 11:37:44,337+ WARN (ioprocess/232748) [IOProcess] 
(6bf1ef03-77e1-423b-850e-9bb6030b590d) Failed to create a probe file: 
'/rhev/data-center/mnt/glusterSD/ddmovirtprod03-strg:_data03/.prob-baf13698-0f43-4672-90a4-86cecdf9f8d0',
 error: 'Stale file handle' (init:461)
2022-03-15 11:37:54,350+ WARN (ioprocess/232748) [IOProcess] 
(6bf1ef03-77e1-423b-850e-9bb6030b590d) Failed to create a probe file: 
'/rhev/data-center/mnt/glusterSD/ddmovirtprod03-strg:_data03/.prob-1e92fdfd-d8e9-48b4-84a9-a2b84fc0d14c',
 error: 'Stale file handle' (init_:461)


After trying different methods to resolve without success I did the following.

1. Moved any VM disks using Storage Domain data03 onto other Storage Domains.
2. Placed data03 Storage Domain ionto Maintenance mode.
3. Placed host03 into Maintenance mode, stopping Gluster services and rebooting.
4. Ensuring all Bricks were up, the peers connected and healing started.
5. Once Gluster volumes were healed I activated host03, at which point host01 
also activated.
6. Host01 was showing as disconnected on most bricks so I rebooted it which 
resolved this.
7. I activated Storage Domain data03 without issue.

The system has been left for 24hrs with no further issues.

The issue is now resolved but it would be helful to know what happened to cause 
the issues with the Storage Domain data03 and where do I look to confirm.

Regards

Simon...
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/55XNGNKOGS3ONWTWDGGJSBORZ2D2MZUT/
  
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TVSGTE5GEIINNZ7QOF6V3PYFRHZTU66S/