[ceph-users] Re: Network Flapping Causing Slow Ops and Freezing VMs

2024-01-08 Thread Eugen Block
You didn't mention which ceph version you're running, assuming that it's managed by cephadm you could put the host in maintenance mode [1] which stops all services and then adds the no-out flag for that host to prevent unnecessary recovery. Once the maintenance is done, exit the maintenance

[ceph-users] Re: Network Flapping Causing Slow Ops and Freezing VMs

2024-01-08 Thread mahnoosh shahidi
Hi Eugen Yes osds were marked as down by mons and there was "wrongly marked as down" in the logs but the osds were down all the time. Actually I was looking for a fast fail procedure for these kind of situation cause any manual action would take time and can causes major incidents. Best Regards,

[ceph-users] Re: Network Flapping Causing Slow Ops and Freezing VMs

2024-01-08 Thread Eugen Block
Hi, just to get a better understanding, when you write Although the OSDs were correctly marked as down in the monitor, slow ops persisted until we resolved the network issue. do you mean that the MONs marked the OSDs as down (temporarily) or did you do that? Because if the OSDs "flap"

[ceph-users] Re: Network Flapping Causing Slow Ops and Freezing VMs

2024-01-06 Thread mahnoosh shahidi
Hi Alexander, No we are not using separate networks and they are on the same physical interfaces. On Sat, Jan 6, 2024 at 7:27 PM Alexander E. Patrakov wrote: > Hello Mahnoosh, > > Just to double check, can you confirm that you are NOT using a > physically separate cluster network and private

[ceph-users] Re: Network Flapping Causing Slow Ops and Freezing VMs

2024-01-06 Thread Alexander E. Patrakov
Hello Mahnoosh, Just to double check, can you confirm that you are NOT using a physically separate cluster network and private network? A configuration with such physically separate networks is inherently vulnerable and therefore cannot be recommended. VLANs on the same physical interface are