[ovirt-users] Re: Gluster volume not responding
Most probably , but I have no clue. You can set the host into maintenance and then activate it ,so the volume get's mounted properly. Best Regards, Strahil Nikolov В петък, 23 октомври 2020 г., 03:16:42 Гринуич+3, Simon Scott написа: Hi Strahil, All networking configs have been checked and correct. I just looked at the gluster volume and noticed the Mount Option ‘logbsize=256k’ on two nodes and is not on the third node. Status of volume: pltfm_data01 Brick : Brick bdtpltfmovt01-strg:/gluster_bricks/pltfm_data01/pltfm_data01 TCP Port : 49152 RDMA Port : 0 Online : Y Pid : 24372 File System : xfs Device : /dev/mapper/gluster_vg_sdb-gluster_lv_pltfm_data01 Mount Options : rw,seclabel,noatime,nodiratime,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota Inode Size : 512 Disk Space Free : 552.0GB Total Disk Space : 1.5TB Inode Count : 157286400 Free Inodes : 157245903 Brick : Brick bdtpltfmovt02-strg:/gluster_bricks/pltfm_data01/pltfm_data01 TCP Port : 49152 RDMA Port : 0 Online : Y Pid : 24485 File System : xfs Device : /dev/mapper/gluster_vg_sdb-gluster_lv_pltfm_data01 Mount Options : rw,seclabel,noatime,nodiratime,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota Inode Size : 512 Disk Space Free : 552.0GB Total Disk Space : 1.5TB Inode Count : 157286400 Free Inodes : 157245885 Brick : Brick bdtpltfmovt03-strg:/gluster_bricks/pltfm_data01/pltfm_data01 TCP Port : 49152 RDMA Port : 0 Online : Y Pid : 24988 File System : xfs Device : /dev/mapper/gluster_vg_sdb-gluster_lv_pltfm_data01 Mount Options : rw,seclabel,noatime,nodiratime,attr2,inode64,sunit=512,swidth=512,noquota Inode Size : 512 Disk Space Free : 552.0GB Total Disk Space : 1.5TB Inode Count : 157286400 Free Inodes : 157245890 Is this possibly causing the instability issues we are experiencing under load? Regards Simon... > On 11 Oct 2020, at 19:18, Strahil Nikolov wrote: > > > > Hi Simon, > > Usually it is the network, but you need real-world data. I would open screen > sessions and run ping continiously . Something like this: > > while true; do echo -n "$(date) "; timeout -s 9 1 ping -c 1 ovirt2 | grep > icmp_seq; sleep 1; done | tee -a /tmp/icmp_log > > Are all systems in the same network ? > What about dns resolution - do you have entries in /etc/hosts ? > > > Best Regards, > Strahil Nikolov > > > В неделя, 11 октомври 2020 г., 11:54:47 Гринуич+3, Simon Scott > написа: > > > > > > > > Thanks Strahil. > > > > > I have found between 1 & 4 Gluster peer rpc-clnt-ping timer expired messages > in the rhev-data-center-mnt-glusterSD-hostname-strg:_pltfm_data01.log on the > storage network IP. Of the 6 Hosts only 1 does not have these timeouts. > > > > > Fencing has been disabled but can you identify which logs are key to > identifying the cause please. > > > > > It's a bonded (bond1) 10GB ovirt-mgmt logical network and Prod VM VLAN > interface AND a bonded (bond2) 10GB Gluster storage network. > > Dropped packets are seen incrementing in the vdsm.log but neither ethtool -S > or kernel logs are showing dropped packets. I am wondering if they are being > dropped due to the ring buffers being small. > > > > > Kind Regards > > > > > Shimme > > > > > > > From: Strahil Nikolov > Sent: Thursday 8 October 2020 20:40 > To: users@ovirt.org ; Simon Scott > Subject: Re: [ovirt-users] Gluster volume not responding > > > > > >> Every Monday and Wednesday morning there are gluster connectivity timeouts >> >but all checks of the network and network configs are ok. >> > > Based on this one I make the following conclusions: > 1. Issue is reoccuring > 2. You most probably have a network issue > > Have you checked the following: > - are there any ping timeouts between fuse clients and gluster nodes > - Have you tried to disable fencing and check the logs after the issue > reoccurs > - Are you sharing Blackup and Prod networks ? Is it possible some > backup/other production load in your environment to "black-out" your oVirt ? > - Have you check the gluster cluster's logs for anything meaningful ? > > Best Regards, > Strahil Nikolov > > > > ___ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/U527TGUQR6RV7Z426NWMO3
[ovirt-users] Re: Gluster volume not responding
Hi Strahil, All networking configs have been checked and correct. I just looked at the gluster volume and noticed the Mount Option ‘logbsize=256k’ on two nodes and is not on the third node. Status of volume: pltfm_data01 Brick : Brick bdtpltfmovt01-strg:/gluster_bricks/pltfm_data01/pltfm_data01 TCP Port : 49152 RDMA Port : 0 Online : Y Pid : 24372 File System : xfs Device : /dev/mapper/gluster_vg_sdb-gluster_lv_pltfm_data01 Mount Options : rw,seclabel,noatime,nodiratime,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota Inode Size : 512 Disk Space Free : 552.0GB Total Disk Space : 1.5TB Inode Count : 157286400 Free Inodes : 157245903 Brick : Brick bdtpltfmovt02-strg:/gluster_bricks/pltfm_data01/pltfm_data01 TCP Port : 49152 RDMA Port : 0 Online : Y Pid : 24485 File System : xfs Device : /dev/mapper/gluster_vg_sdb-gluster_lv_pltfm_data01 Mount Options : rw,seclabel,noatime,nodiratime,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota Inode Size : 512 Disk Space Free : 552.0GB Total Disk Space : 1.5TB Inode Count : 157286400 Free Inodes : 157245885 Brick : Brick bdtpltfmovt03-strg:/gluster_bricks/pltfm_data01/pltfm_data01 TCP Port : 49152 RDMA Port : 0 Online : Y Pid : 24988 File System : xfs Device : /dev/mapper/gluster_vg_sdb-gluster_lv_pltfm_data01 Mount Options : rw,seclabel,noatime,nodiratime,attr2,inode64,sunit=512,swidth=512,noquota Inode Size : 512 Disk Space Free : 552.0GB Total Disk Space : 1.5TB Inode Count : 157286400 Free Inodes : 157245890 Is this possibly causing the instability issues we are experiencing under load? Regards Simon... On 11 Oct 2020, at 19:18, Strahil Nikolov wrote: Hi Simon, Usually it is the network, but you need real-world data. I would open screen sessions and run ping continiously . Something like this: while true; do echo -n "$(date) "; timeout -s 9 1 ping -c 1 ovirt2 | grep icmp_seq; sleep 1; done | tee -a /tmp/icmp_log Are all systems in the same network ? What about dns resolution - do you have entries in /etc/hosts ? Best Regards, Strahil Nikolov В неделя, 11 октомври 2020 г., 11:54:47 Гринуич+3, Simon Scott написа: Thanks Strahil. I have found between 1 & 4 Gluster peer rpc-clnt-ping timer expired messages in the rhev-data-center-mnt-glusterSD-hostname-strg:_pltfm_data01.log on the storage network IP. Of the 6 Hosts only 1 does not have these timeouts. Fencing has been disabled but can you identify which logs are key to identifying the cause please. It's a bonded (bond1) 10GB ovirt-mgmt logical network and Prod VM VLAN interface AND a bonded (bond2) 10GB Gluster storage network. Dropped packets are seen incrementing in the vdsm.log but neither ethtool -S or kernel logs are showing dropped packets. I am wondering if they are being dropped due to the ring buffers being small. Kind Regards Shimme From: Strahil Nikolov Sent: Thursday 8 October 2020 20:40 To: users@ovirt.org ; Simon Scott Subject: Re: [ovirt-users] Gluster volume not responding Every Monday and Wednesday morning there are gluster connectivity timeouts >but all checks of the network and network configs are ok. Based on this one I make the following conclusions: 1. Issue is reoccuring 2. You most probably have a network issue Have you checked the following: - are there any ping timeouts between fuse clients and gluster nodes - Have you tried to disable fencing and check the logs after the issue reoccurs - Are you sharing Blackup and Prod networks ? Is it possible some backup/other production load in your environment to "black-out" your oVirt ? - Have you check the gluster cluster's logs for anything meaningful ? Best Regards, Strahil Nikolov ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/U527TGUQR6RV7Z426NWMO3K4OXQJABCM/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/BQGSAK4YTKXY75D2TS3ACHS24RHQ7CDF/
[ovirt-users] Re: Gluster volume not responding
Hi Simon, Usually it is the network, but you need real-world data. I would open screen sessions and run ping continiously . Something like this: while true; do echo -n "$(date) "; timeout -s 9 1 ping -c 1 ovirt2 | grep icmp_seq; sleep 1; done | tee -a /tmp/icmp_log Are all systems in the same network ? What about dns resolution - do you have entries in /etc/hosts ? Best Regards, Strahil Nikolov В неделя, 11 октомври 2020 г., 11:54:47 Гринуич+3, Simon Scott написа: Thanks Strahil. I have found between 1 & 4 Gluster peer rpc-clnt-ping timer expired messages in the rhev-data-center-mnt-glusterSD-hostname-strg:_pltfm_data01.log on the storage network IP. Of the 6 Hosts only 1 does not have these timeouts. Fencing has been disabled but can you identify which logs are key to identifying the cause please. It's a bonded (bond1) 10GB ovirt-mgmt logical network and Prod VM VLAN interface AND a bonded (bond2) 10GB Gluster storage network. Dropped packets are seen incrementing in the vdsm.log but neither ethtool -S or kernel logs are showing dropped packets. I am wondering if they are being dropped due to the ring buffers being small. Kind Regards Shimme From: Strahil Nikolov Sent: Thursday 8 October 2020 20:40 To: users@ovirt.org ; Simon Scott Subject: Re: [ovirt-users] Gluster volume not responding >Every Monday and Wednesday morning there are gluster connectivity timeouts >>but all checks of the network and network configs are ok. Based on this one I make the following conclusions: 1. Issue is reoccuring 2. You most probably have a network issue Have you checked the following: - are there any ping timeouts between fuse clients and gluster nodes - Have you tried to disable fencing and check the logs after the issue reoccurs - Are you sharing Blackup and Prod networks ? Is it possible some backup/other production load in your environment to "black-out" your oVirt ? - Have you check the gluster cluster's logs for anything meaningful ? Best Regards, Strahil Nikolov ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/U527TGUQR6RV7Z426NWMO3K4OXQJABCM/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VXYVXMBCPU5UCTQVVHWQ3PLL2RHCJJ7G/
[ovirt-users] Re: Gluster volume not responding
Thanks Strahil. I have found between 1 & 4 Gluster peer rpc-clnt-ping timer expired messages in the rhev-data-center-mnt-glusterSD-hostname-strg:_pltfm_data01.log on the storage network IP. Of the 6 Hosts only 1 does not have these timeouts. Fencing has been disabled but can you identify which logs are key to identifying the cause please. It's a bonded (bond1) 10GB ovirt-mgmt logical network and Prod VM VLAN interface AND a bonded (bond2) 10GB Gluster storage network. Dropped packets are seen incrementing in the vdsm.log but neither ethtool -S or kernel logs are showing dropped packets. I am wondering if they are being dropped due to the ring buffers being small. Kind Regards Shimme From: Strahil Nikolov Sent: Thursday 8 October 2020 20:40 To: users@ovirt.org ; Simon Scott Subject: Re: [ovirt-users] Gluster volume not responding >Every Monday and Wednesday morning there are gluster connectivity timeouts >>but all checks of the network and network configs are ok. Based on this one I make the following conclusions: 1. Issue is reoccuring 2. You most probably have a network issue Have you checked the following: - are there any ping timeouts between fuse clients and gluster nodes - Have you tried to disable fencing and check the logs after the issue reoccurs - Are you sharing Blackup and Prod networks ? Is it possible some backup/other production load in your environment to "black-out" your oVirt ? - Have you check the gluster cluster's logs for anything meaningful ? Best Regards, Strahil Nikolov ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/U527TGUQR6RV7Z426NWMO3K4OXQJABCM/
[ovirt-users] Re: Gluster volume not responding
Hi Simon, I doubt the system needs tuning from network perspective. I guess you can run some 'screen'-s which a pinging another system and logging everything to a file. Best Regards, Strahil Nikolov В петък, 9 октомври 2020 г., 01:05:22 Гринуич+3, Simon Scott написа: Thanks Strahil. I have found between 1 & 4 Gluster peer rpc-clnt-ping timer expired messages in the rhev-data-center-mnt-glusterSD-hostname-strg:_pltfm_data01.log on the storage network IP. Of the 6 Hosts only 1 does not have these timeouts. Fencing has been disabled but can you identify which logs are key to identifying the cause please. It's a bonded (bond1) 10GB ovirt-mgmt logical network and Prod VM VLAN interface AND a bonded (bond2) 10GB Gluster storage network. Dropped packets are seen incrementing in the vdsm.log but neither ethtool -S or kernel logs are showing dropped packets. I am wondering if they are being dropped due to the ring buffers being small. Kind Regards Shimme From: Strahil Nikolov Sent: Thursday 8 October 2020 20:40 To: users@ovirt.org ; Simon Scott Subject: Re: [ovirt-users] Gluster volume not responding >Every Monday and Wednesday morning there are gluster connectivity timeouts >>but all checks of the network and network configs are ok. Based on this one I make the following conclusions: 1. Issue is reoccuring 2. You most probably have a network issue Have you checked the following: - are there any ping timeouts between fuse clients and gluster nodes - Have you tried to disable fencing and check the logs after the issue reoccurs - Are you sharing Blackup and Prod networks ? Is it possible some backup/other production load in your environment to "black-out" your oVirt ? - Have you check the gluster cluster's logs for anything meaningful ? Best Regards, Strahil Nikolov ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/52YVULUR3YV4AQLKPLRN3OZ3JC4V4RZO/
[ovirt-users] Re: Gluster volume not responding
I have seen many "checks" that are "OK"... Have you checked that backups are not used over the same network ? I would disable the power management (fencing) ,so I can find out what has happened to the systems. Best Regards, Strahil Nikolov В четвъртък, 8 октомври 2020 г., 22:43:34 Гринуич+3, Strahil Nikolov via Users написа: >Every Monday and Wednesday morning there are gluster connectivity timeouts >>but all checks of the network and network configs are ok. Based on this one I make the following conclusions: 1. Issue is reoccuring 2. You most probably have a network issue Have you checked the following: - are there any ping timeouts between fuse clients and gluster nodes - Have you tried to disable fencing and check the logs after the issue reoccurs - Are you sharing Blackup and Prod networks ? Is it possible some backup/other production load in your environment to "black-out" your oVirt ? - Have you check the gluster cluster's logs for anything meaningful ? Best Regards, Strahil Nikolov ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/TIDHSP34LVYUIDUU76OX3PFDEHL7LSWE/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/HKJVBXDIN4DJ7LKFDQY6RBWFY5X2U6XW/
[ovirt-users] Re: Gluster volume not responding
>Every Monday and Wednesday morning there are gluster connectivity timeouts >>but all checks of the network and network configs are ok. Based on this one I make the following conclusions: 1. Issue is reoccuring 2. You most probably have a network issue Have you checked the following: - are there any ping timeouts between fuse clients and gluster nodes - Have you tried to disable fencing and check the logs after the issue reoccurs - Are you sharing Blackup and Prod networks ? Is it possible some backup/other production load in your environment to "black-out" your oVirt ? - Have you check the gluster cluster's logs for anything meaningful ? Best Regards, Strahil Nikolov ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/TIDHSP34LVYUIDUU76OX3PFDEHL7LSWE/