[ovirt-users] Re: Gluster volume not responding

2020-10-23 Thread Strahil Nikolov via Users
Most probably , but I have no clue.

You can set the host into maintenance and then activate it ,so the volume get's 
mounted properly.

Best Regards,
Strahil Nikolov






В петък, 23 октомври 2020 г., 03:16:42 Гринуич+3, Simon Scott 
 написа: 






Hi Strahil,




All networking configs have been checked and correct.




I just looked at the gluster volume and noticed the Mount Option 
‘logbsize=256k’ on two nodes and is not on the third node.





Status of volume: pltfm_data01

Brick : Brick  bdtpltfmovt01-strg:/gluster_bricks/pltfm_data01/pltfm_data01 TCP 
Port : 49152 RDMA Port : 0 Online : Y Pid : 24372 File System : xfs Device : 
/dev/mapper/gluster_vg_sdb-gluster_lv_pltfm_data01 Mount Options : 
rw,seclabel,noatime,nodiratime,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
 Inode Size : 512 Disk Space Free : 552.0GB Total Disk Space : 1.5TB Inode 
Count : 157286400

Free Inodes : 157245903

Brick : Brick  bdtpltfmovt02-strg:/gluster_bricks/pltfm_data01/pltfm_data01 TCP 
Port : 49152 RDMA Port : 0 Online : Y Pid : 24485 File System : xfs Device : 
/dev/mapper/gluster_vg_sdb-gluster_lv_pltfm_data01 Mount Options : 
rw,seclabel,noatime,nodiratime,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
 Inode Size : 512 Disk Space Free : 552.0GB Total Disk Space : 1.5TB Inode 
Count : 157286400

Free Inodes : 157245885

Brick : Brick  bdtpltfmovt03-strg:/gluster_bricks/pltfm_data01/pltfm_data01 TCP 
Port : 49152 RDMA Port : 0 Online : Y Pid : 24988 File System : xfs Device : 
/dev/mapper/gluster_vg_sdb-gluster_lv_pltfm_data01 Mount Options : 
rw,seclabel,noatime,nodiratime,attr2,inode64,sunit=512,swidth=512,noquota Inode 
Size : 512 Disk Space Free : 552.0GB Total Disk Space : 1.5TB Inode Count : 
157286400 Free Inodes : 157245890





Is this possibly causing the instability issues we are experiencing under load?




Regards




Simon...



> On 11 Oct 2020, at 19:18, Strahil Nikolov  wrote:
> 
> 


>  
> Hi Simon,
> 
> Usually it is the network, but you need real-world data. I would open screen 
> sessions and run ping continiously . Something like this:
> 
> while true; do echo -n "$(date) "; timeout -s 9 1 ping -c 1 ovirt2 | grep 
> icmp_seq; sleep 1; done | tee -a /tmp/icmp_log
> 
> Are all systems in the same network ?
> What about dns resolution - do you have entries in /etc/hosts ?
> 
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> В неделя, 11 октомври 2020 г., 11:54:47 Гринуич+3, Simon Scott 
>  написа: 
> 
> 
> 
> 
> 
> 
> 
> Thanks Strahil.
> 
> 
> 
> 
> I have found between 1 & 4 Gluster peer rpc-clnt-ping timer expired messages 
> in the rhev-data-center-mnt-glusterSD-hostname-strg:_pltfm_data01.log on the 
> storage network IP. Of the 6 Hosts only 1 does not have these timeouts.
> 
> 
> 
> 
> Fencing has been disabled but can you identify which logs are key to 
> identifying the cause please.
> 
> 
> 
> 
> It's a bonded (bond1) 10GB ovirt-mgmt logical network and Prod VM VLAN 
> interface AND a bonded (bond2) 10GB Gluster storage network. 
> 
> Dropped packets are seen incrementing in the vdsm.log but neither ethtool -S 
> or kernel logs are showing dropped packets. I am wondering if they are being 
> dropped due to the ring buffers being small.
> 
> 
> 
> 
> Kind Regards
> 
> 
> 
> 
> Shimme
> 
> 
> 
> 
> 
>  
> From: Strahil Nikolov 
> Sent: Thursday 8 October 2020 20:40
> To: users@ovirt.org ; Simon Scott 
> Subject: Re: [ovirt-users] Gluster volume not responding 
>  
> 
> 
> 
> 
>> Every Monday and Wednesday morning there are gluster connectivity timeouts 
>> >but all checks of the network and network configs are ok.
>> 
> 
> Based on this one I make the following conclusions:
> 1. Issue is reoccuring
> 2. You most probably have a network issue
> 
> Have you checked the following:
> - are there any ping timeouts between fuse clients and gluster nodes
> - Have you tried to disable fencing and check the logs after the issue 
> reoccurs
> - Are you sharing Blackup and Prod networks ? Is it possible some 
> backup/other production load in your environment to "black-out" your oVirt ?
> - Have you check the gluster cluster's logs for anything meaningful ?
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/U527TGUQR6RV7Z426NWMO3

[ovirt-users] Re: Gluster volume not responding

2020-10-22 Thread Simon Scott
Hi Strahil,

All networking configs have been checked and correct.

I just looked at the gluster volume and noticed the Mount Option 
‘logbsize=256k’ on two nodes and is not on the third node.

Status of volume: pltfm_data01

Brick : Brick bdtpltfmovt01-strg:/gluster_bricks/pltfm_data01/pltfm_data01 TCP 
Port : 49152 RDMA Port : 0 Online : Y Pid : 24372 File System : xfs Device : 
/dev/mapper/gluster_vg_sdb-gluster_lv_pltfm_data01 Mount Options : 
rw,seclabel,noatime,nodiratime,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
 Inode Size : 512 Disk Space Free : 552.0GB Total Disk Space : 1.5TB Inode 
Count : 157286400

Free Inodes : 157245903

Brick : Brick bdtpltfmovt02-strg:/gluster_bricks/pltfm_data01/pltfm_data01 TCP 
Port : 49152 RDMA Port : 0 Online : Y Pid : 24485 File System : xfs Device : 
/dev/mapper/gluster_vg_sdb-gluster_lv_pltfm_data01 Mount Options : 
rw,seclabel,noatime,nodiratime,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota
 Inode Size : 512 Disk Space Free : 552.0GB Total Disk Space : 1.5TB Inode 
Count : 157286400

Free Inodes : 157245885

Brick : Brick bdtpltfmovt03-strg:/gluster_bricks/pltfm_data01/pltfm_data01 TCP 
Port : 49152 RDMA Port : 0 Online : Y Pid : 24988 File System : xfs Device : 
/dev/mapper/gluster_vg_sdb-gluster_lv_pltfm_data01 Mount Options : 
rw,seclabel,noatime,nodiratime,attr2,inode64,sunit=512,swidth=512,noquota Inode 
Size : 512 Disk Space Free : 552.0GB Total Disk Space : 1.5TB Inode Count : 
157286400 Free Inodes : 157245890

Is this possibly causing the instability issues we are experiencing under load?

Regards

Simon...

On 11 Oct 2020, at 19:18, Strahil Nikolov  wrote:

Hi Simon,

Usually it is the network, but you need real-world data. I would open screen 
sessions and run ping continiously . Something like this:

while true; do echo -n "$(date) "; timeout -s 9 1 ping -c 1 ovirt2 | grep 
icmp_seq; sleep 1; done | tee -a /tmp/icmp_log

Are all systems in the same network ?
What about dns resolution - do you have entries in /etc/hosts ?


Best Regards,
Strahil Nikolov


В неделя, 11 октомври 2020 г., 11:54:47 Гринуич+3, Simon Scott 
 написа:







Thanks Strahil.




I have found between 1 & 4 Gluster peer rpc-clnt-ping timer expired messages in 
the rhev-data-center-mnt-glusterSD-hostname-strg:_pltfm_data01.log on the 
storage network IP. Of the 6 Hosts only 1 does not have these timeouts.




Fencing has been disabled but can you identify which logs are key to 
identifying the cause please.




It's a bonded (bond1) 10GB ovirt-mgmt logical network and Prod VM VLAN 
interface AND a bonded (bond2) 10GB Gluster storage network.

Dropped packets are seen incrementing in the vdsm.log but neither ethtool -S or 
kernel logs are showing dropped packets. I am wondering if they are being 
dropped due to the ring buffers being small.




Kind Regards




Shimme






From: Strahil Nikolov 
Sent: Thursday 8 October 2020 20:40
To: users@ovirt.org ; Simon Scott 
Subject: Re: [ovirt-users] Gluster volume not responding




Every Monday and Wednesday morning there are gluster connectivity timeouts >but 
all checks of the network and network configs are ok.

Based on this one I make the following conclusions:
1. Issue is reoccuring
2. You most probably have a network issue

Have you checked the following:
- are there any ping timeouts between fuse clients and gluster nodes
- Have you tried to disable fencing and check the logs after the issue reoccurs
- Are you sharing Blackup and Prod networks ? Is it possible some backup/other 
production load in your environment to "black-out" your oVirt ?
- Have you check the gluster cluster's logs for anything meaningful ?

Best Regards,
Strahil Nikolov



___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/U527TGUQR6RV7Z426NWMO3K4OXQJABCM/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BQGSAK4YTKXY75D2TS3ACHS24RHQ7CDF/


[ovirt-users] Re: Gluster volume not responding

2020-10-11 Thread Strahil Nikolov via Users
Hi Simon,

Usually it is the network, but you need real-world data. I would open screen 
sessions and run ping continiously . Something like this:

while true; do echo -n "$(date) "; timeout -s 9 1 ping -c 1 ovirt2 | grep 
icmp_seq; sleep 1; done | tee -a /tmp/icmp_log

Are all systems in the same network ?
What about dns resolution - do you have entries in /etc/hosts ?


Best Regards,
Strahil Nikolov


В неделя, 11 октомври 2020 г., 11:54:47 Гринуич+3, Simon Scott 
 написа: 







Thanks Strahil.




I have found between 1 & 4 Gluster peer rpc-clnt-ping timer expired messages in 
the rhev-data-center-mnt-glusterSD-hostname-strg:_pltfm_data01.log on the 
storage network IP. Of the 6 Hosts only 1 does not have these timeouts.




Fencing has been disabled but can you identify which logs are key to 
identifying the cause please.




It's a bonded (bond1) 10GB ovirt-mgmt logical network and Prod VM VLAN 
interface AND a bonded (bond2) 10GB Gluster storage network. 

Dropped packets are seen incrementing in the vdsm.log but neither ethtool -S or 
kernel logs are showing dropped packets. I am wondering if they are being 
dropped due to the ring buffers being small.




Kind Regards




Shimme





 
From: Strahil Nikolov 
Sent: Thursday 8 October 2020 20:40
To: users@ovirt.org ; Simon Scott 
Subject: Re: [ovirt-users] Gluster volume not responding 
 



>Every Monday and Wednesday morning there are gluster connectivity timeouts 
>>but all checks of the network and network configs are ok.

Based on this one I make the following conclusions:
1. Issue is reoccuring
2. You most probably have a network issue

Have you checked the following:
- are there any ping timeouts between fuse clients and gluster nodes
- Have you tried to disable fencing and check the logs after the issue reoccurs
- Are you sharing Blackup and Prod networks ? Is it possible some backup/other 
production load in your environment to "black-out" your oVirt ?
- Have you check the gluster cluster's logs for anything meaningful ?

Best Regards,
Strahil Nikolov



___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/U527TGUQR6RV7Z426NWMO3K4OXQJABCM/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VXYVXMBCPU5UCTQVVHWQ3PLL2RHCJJ7G/


[ovirt-users] Re: Gluster volume not responding

2020-10-11 Thread Simon Scott
Thanks Strahil.

I have found between 1 & 4 Gluster peer rpc-clnt-ping timer expired messages in 
the rhev-data-center-mnt-glusterSD-hostname-strg:_pltfm_data01.log on the 
storage network IP. Of the 6 Hosts only 1 does not have these timeouts.

Fencing has been disabled but can you identify which logs are key to 
identifying the cause please.

It's a bonded (bond1) 10GB ovirt-mgmt logical network and Prod VM VLAN 
interface AND a bonded (bond2) 10GB Gluster storage network.
Dropped packets are seen incrementing in the vdsm.log but neither ethtool -S or 
kernel logs are showing dropped packets. I am wondering if they are being 
dropped due to the ring buffers being small.

Kind Regards

Shimme


From: Strahil Nikolov 
Sent: Thursday 8 October 2020 20:40
To: users@ovirt.org ; Simon Scott 
Subject: Re: [ovirt-users] Gluster volume not responding

>Every Monday and Wednesday morning there are gluster connectivity timeouts 
>>but all checks of the network and network configs are ok.

Based on this one I make the following conclusions:
1. Issue is reoccuring
2. You most probably have a network issue

Have you checked the following:
- are there any ping timeouts between fuse clients and gluster nodes
- Have you tried to disable fencing and check the logs after the issue reoccurs
- Are you sharing Blackup and Prod networks ? Is it possible some backup/other 
production load in your environment to "black-out" your oVirt ?
- Have you check the gluster cluster's logs for anything meaningful ?

Best Regards,
Strahil Nikolov
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/U527TGUQR6RV7Z426NWMO3K4OXQJABCM/


[ovirt-users] Re: Gluster volume not responding

2020-10-08 Thread Strahil Nikolov via Users
Hi Simon,

I doubt the system needs tuning from network perspective.

I guess you can run some 'screen'-s which a pinging another system and logging 
everything to a file.

Best Regards,
Strahil Nikolov






В петък, 9 октомври 2020 г., 01:05:22 Гринуич+3, Simon Scott 
 написа: 







Thanks Strahil.




I have found between 1 & 4 Gluster peer rpc-clnt-ping timer expired messages in 
the rhev-data-center-mnt-glusterSD-hostname-strg:_pltfm_data01.log on the 
storage network IP. Of the 6 Hosts only 1 does not have these timeouts.




Fencing has been disabled but can you identify which logs are key to 
identifying the cause please.




It's a bonded (bond1) 10GB ovirt-mgmt logical network and Prod VM VLAN 
interface AND a bonded (bond2) 10GB Gluster storage network. 

Dropped packets are seen incrementing in the vdsm.log but neither ethtool -S or 
kernel logs are showing dropped packets. I am wondering if they are being 
dropped due to the ring buffers being small.




Kind Regards




Shimme





 
From: Strahil Nikolov 
Sent: Thursday 8 October 2020 20:40
To: users@ovirt.org ; Simon Scott 
Subject: Re: [ovirt-users] Gluster volume not responding 
 



>Every Monday and Wednesday morning there are gluster connectivity timeouts 
>>but all checks of the network and network configs are ok.

Based on this one I make the following conclusions:
1. Issue is reoccuring
2. You most probably have a network issue

Have you checked the following:
- are there any ping timeouts between fuse clients and gluster nodes
- Have you tried to disable fencing and check the logs after the issue reoccurs
- Are you sharing Blackup and Prod networks ? Is it possible some backup/other 
production load in your environment to "black-out" your oVirt ?
- Have you check the gluster cluster's logs for anything meaningful ?

Best Regards,
Strahil Nikolov


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/52YVULUR3YV4AQLKPLRN3OZ3JC4V4RZO/


[ovirt-users] Re: Gluster volume not responding

2020-10-08 Thread Strahil Nikolov via Users
I have seen many "checks" that are "OK"...
Have you checked that backups are not used over the same network ?

I would disable the power management (fencing) ,so I can find out what has 
happened to the systems.


Best Regards,
Strahil Nikolov






В четвъртък, 8 октомври 2020 г., 22:43:34 Гринуич+3, Strahil Nikolov via Users 
 написа: 





>Every Monday and Wednesday morning there are gluster connectivity timeouts 
>>but all checks of the network and network configs are ok.

Based on this one I make the following conclusions:
1. Issue is reoccuring
2. You most probably have a network issue

Have you checked the following:
- are there any ping timeouts between fuse clients and gluster nodes
- Have you tried to disable fencing and check the logs after the issue reoccurs
- Are you sharing Blackup and Prod networks ? Is it possible some backup/other 
production load in your environment to "black-out" your oVirt ?
- Have you check the gluster cluster's logs for anything meaningful ?

Best Regards,
Strahil Nikolov

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TIDHSP34LVYUIDUU76OX3PFDEHL7LSWE/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HKJVBXDIN4DJ7LKFDQY6RBWFY5X2U6XW/


[ovirt-users] Re: Gluster volume not responding

2020-10-08 Thread Strahil Nikolov via Users
>Every Monday and Wednesday morning there are gluster connectivity timeouts 
>>but all checks of the network and network configs are ok.

Based on this one I make the following conclusions:
1. Issue is reoccuring
2. You most probably have a network issue

Have you checked the following:
- are there any ping timeouts between fuse clients and gluster nodes
- Have you tried to disable fencing and check the logs after the issue reoccurs
- Are you sharing Blackup and Prod networks ? Is it possible some backup/other 
production load in your environment to "black-out" your oVirt ?
- Have you check the gluster cluster's logs for anything meaningful ?

Best Regards,
Strahil Nikolov
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TIDHSP34LVYUIDUU76OX3PFDEHL7LSWE/