Stale file handle is an indication of a split brain situation. On a 3-way
replica, this could only mean gfid mismatch (gfid is unique id for each file in
gluster).
I think those .prob can be deleted safely, but I am not fully convinced.
What version of oVirt are you using ? What about gluster version ?
Best Regards,Strahil Nikolov
2 days ago I found that 2 of the 3 oVirt nodes had been set to
'Non-Operational'. GlusterFS seemed to be ok from the commandline, but the
oVirt engine WebUI was reporting 2 out of 3 bricks per volume as down and event
logs were filling up with the following types of messages.
Failed to connect Host ddmovirtprod03 to the Storage Domains data03.
The error message for connection ddmovirtprod03-strg:/data03 returned by VDSM
was: Problem while trying to mount target
Failed to connect Host ddmovirtprod03 to Storage Serverthe s
Host ddmovirtprod03 cannot access the Storage Domain(s) data03 attached to the
Data Center DDM_Production_DC. Setting Host state to Non-Operational.
Failed to connect Host ddmovirtprod03 to Storage Pool
Host ddmovirtprod01 reports about one of the Active Storage Domains as
Problematic.
Host ddmovirtprod01 cannot access the Storage Domain(s) data03 attached to the
Data Center DDM_Production_DC. Setting Host state to Non-Operational.
Failed to connect Host ddmovirtprod01 to Storage Pool DDM_Production_DC
The following is from the vdsm.log on host01:
[root@ddmovirtprod01 vdsm]# tail -f /var/log/vdsm/vdsm.log | grep "WARN"
2022-03-15 11:37:14,299+ WARN (ioprocess/232748) [IOProcess]
(6bf1ef03-77e1-423b-850e-9bb6030b590d) Failed to create a probe file:
'/rhev/data-center/mnt/glusterSD/ddmovirtprod03-strg:data03/.prob-6c101766-4e5d-40c6-8fa8-0f7e3b3e931e',
error: 'Stale file handle' (init:461)
2022-03-15 11:37:24,313+ WARN (ioprocess/232748) [IOProcess]
(6bf1ef03-77e1-423b-850e-9bb6030b590d) Failed to create a probe file:
'/rhev/data-center/mnt/glusterSD/ddmovirtprod03-strg:_data03/.prob-c3fa017b-94dc-47d1-89a4-8ee046509a32',
error: 'Stale file handle' (init:461)
2022-03-15 11:37:34,325+ WARN (ioprocess/232748) [IOProcess]
(6bf1ef03-77e1-423b-850e-9bb6030b590d) Failed to create a probe file:
'/rhev/data-center/mnt/glusterSD/ddmovirtprod03-strg:_data03/.prob-e173ecac-4d4d-4b59-a437-61eb5d0beb83',
error: 'Stale file handle' (init:461)
2022-03-15 11:37:44,337+ WARN (ioprocess/232748) [IOProcess]
(6bf1ef03-77e1-423b-850e-9bb6030b590d) Failed to create a probe file:
'/rhev/data-center/mnt/glusterSD/ddmovirtprod03-strg:_data03/.prob-baf13698-0f43-4672-90a4-86cecdf9f8d0',
error: 'Stale file handle' (init:461)
2022-03-15 11:37:54,350+ WARN (ioprocess/232748) [IOProcess]
(6bf1ef03-77e1-423b-850e-9bb6030b590d) Failed to create a probe file:
'/rhev/data-center/mnt/glusterSD/ddmovirtprod03-strg:_data03/.prob-1e92fdfd-d8e9-48b4-84a9-a2b84fc0d14c',
error: 'Stale file handle' (init_:461)
After trying different methods to resolve without success I did the following.
1. Moved any VM disks using Storage Domain data03 onto other Storage Domains.
2. Placed data03 Storage Domain ionto Maintenance mode.
3. Placed host03 into Maintenance mode, stopping Gluster services and rebooting.
4. Ensuring all Bricks were up, the peers connected and healing started.
5. Once Gluster volumes were healed I activated host03, at which point host01
also activated.
6. Host01 was showing as disconnected on most bricks so I rebooted it which
resolved this.
7. I activated Storage Domain data03 without issue.
The system has been left for 24hrs with no further issues.
The issue is now resolved but it would be helful to know what happened to cause
the issues with the Storage Domain data03 and where do I look to confirm.
Regards
Simon...
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/55XNGNKOGS3ONWTWDGGJSBORZ2D2MZUT/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TVSGTE5GEIINNZ7QOF6V3PYFRHZTU66S/