--- Begin Message ---
Hi Grzegorz,
September 19, 2023 12:28 PM, "lord_Niedzwiedz" <[email protected]> wrote:
> I have non heterogenic network and hardware.
> CEPH about Write 1160MB/sec, Read 1820 MB/sec
>
> One nvme drive started going crazy.
Have a look at the OSD latency, when it's high then it might be either HW issue
or maybe the RocksDB performance degraded. In the latter case you can try an
offline compaction [0].
And to add, array's are nasty, the have a lot of other quirks. It might not be
the points mentioned above at all. ;)
> The performance of the entire array dropped catastrophically.
> The system said nothing.
> I wonder if there is any mechanism in CEPH/Proxmox that informs us about
> this automatically ??
You can activate the Prometheus module [1] on the MGR and scrape performance
data and alerts from there. Modules for other monitoring solutions exist as
well.
Cheers,
Alwin
[0]
https://github.com/cernceph/ceph-scripts/blob/master/tools/bluestore/offline-compact.sh
[1] https://docs.ceph.com/en/latest/mgr/prometheus/
--- End Message ---
_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user