[ceph-users] Re: Ceph Outage (Nautilus) - 14.2.11 [EXT]

2020-12-16 Thread Suresh Rama
Thanks stefan. Will review your feedback. Matt suggested the same. On Wed, Dec 16, 2020, 4:38 AM Stefan Kooman wrote: > On 12/16/20 10:21 AM, Matthew Vernon wrote: > > Hi, > > > > On 15/12/2020 20:44, Suresh Rama wrote: > > > > TL;DR: use a real NTP client, not systemd-timesyncd > > +1. We

[ceph-users] Re: Ceph Outage (Nautilus) - 14.2.11 [EXT]

2020-12-16 Thread Stefan Kooman
On 12/16/20 10:21 AM, Matthew Vernon wrote: Hi, On 15/12/2020 20:44, Suresh Rama wrote: TL;DR: use a real NTP client, not systemd-timesyncd +1. We have a lot of "ntp" daemons running, but on Ceph we use "chrony", and it's way faster with converging (especially with very unstable clock

[ceph-users] Re: Ceph Outage (Nautilus) - 14.2.11

2020-12-16 Thread Frédéric Nass
Hi Suresh, 24 HDDs backed by only by 2 NVMes looks like a high ratio. What triggers my bell in your post is "upgraded from Luminous to Nautilus" and "Elasticsearch" which mainly reads to index data and also "memory leak". You might want to take a look at the current value of

[ceph-users] Re: Ceph Outage (Nautilus) - 14.2.11 [EXT]

2020-12-16 Thread Matthew Vernon
Hi, On 15/12/2020 20:44, Suresh Rama wrote: TL;DR: use a real NTP client, not systemd-timesyncd 1) We audited the network (inspecting TOR, iperf, MTR) and nothing was indicating any issue but OSD logs were keep complaining about BADAUTHORIZER ...this is quite possibly due to clock skew on