You didn't mention which ceph version you're running, assuming that
it's managed by cephadm you could put the host in maintenance mode [1]
which stops all services and then adds the no-out flag for that host
to prevent unnecessary recovery.
Once the maintenance is done, exit the maintenance
Hi Eugen
Yes osds were marked as down by mons and there was "wrongly marked as down"
in the logs but the osds were down all the time. Actually I was looking for
a fast fail procedure for these kind of situation cause any manual action
would take time and can causes major incidents.
Best Regards,
Hi,
just to get a better understanding, when you write
Although the OSDs were correctly marked as down in the monitor, slow
ops persisted until we resolved the network issue.
do you mean that the MONs marked the OSDs as down (temporarily) or did
you do that? Because if the OSDs "flap"
Hi Alexander,
No we are not using separate networks and they are on the same physical
interfaces.
On Sat, Jan 6, 2024 at 7:27 PM Alexander E. Patrakov
wrote:
> Hello Mahnoosh,
>
> Just to double check, can you confirm that you are NOT using a
> physically separate cluster network and private
Hello Mahnoosh,
Just to double check, can you confirm that you are NOT using a
physically separate cluster network and private network? A
configuration with such physically separate networks is inherently
vulnerable and therefore cannot be recommended. VLANs on the same
physical interface are