Re: [ceph-users] How do you deal with "clock skew detected"?

2019-05-16 Thread Uwe Sauter
You could also edit your ceph-mon@.service (assuming systemd) to depend on chrony and add a line 
"ExecStartPre=/usr/bin/sleep 30" to stall the startup to give chrony a chance to sync before the Mon is started.




Am 16.05.19 um 17:38 schrieb Stefan Kooman:

Quoting Jan Kasprzak (k...@fi.muni.cz):


OK, many responses (thanks for them!) suggest chrony, so I tried it:
With all three mons running chrony and being in sync with my NTP server
with offsets under 0.0001 second, I rebooted one of the mons:

There still was the HEALTH_WARN clock_skew message as soon as
the rebooted mon starts responding to ping. The cluster returns to
HEALTH_OK about 95 seconds later.

According to "ntpdate -q my.ntp.server", the initial offset
after reboot is about 0.6 s (which is the reason of HEALTH_WARN, I think),
but it gets under 0.0001 s in about 25 seconds. The remaining ~50 seconds
of HEALTH_WARN is inside Ceph, with mons being already synchronized.

So the result is that chrony indeed synchronizes faster,
but nevertheless I still have about 95 seconds of HEALTH_WARN "clock skew
detected".

I guess now the workaround now is to ignore the warning, and wait
for two minutes before rebooting another mon.


You can tune the "mon_timecheck_skew_interval" which by default is set
to 30 seconds. See [1] and look for "timecheck" to find the different
options.

Gr. Stefan

[1]:
http://docs.ceph.com/docs/master/rados/configuration/mon-config-ref/


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How do you deal with "clock skew detected"?

2019-05-16 Thread Stefan Kooman
Quoting Jan Kasprzak (k...@fi.muni.cz):

>   OK, many responses (thanks for them!) suggest chrony, so I tried it:
> With all three mons running chrony and being in sync with my NTP server
> with offsets under 0.0001 second, I rebooted one of the mons:
> 
>   There still was the HEALTH_WARN clock_skew message as soon as
> the rebooted mon starts responding to ping. The cluster returns to
> HEALTH_OK about 95 seconds later.
> 
>   According to "ntpdate -q my.ntp.server", the initial offset
> after reboot is about 0.6 s (which is the reason of HEALTH_WARN, I think),
> but it gets under 0.0001 s in about 25 seconds. The remaining ~50 seconds
> of HEALTH_WARN is inside Ceph, with mons being already synchronized.
> 
>   So the result is that chrony indeed synchronizes faster,
> but nevertheless I still have about 95 seconds of HEALTH_WARN "clock skew
> detected".
> 
>   I guess now the workaround now is to ignore the warning, and wait
> for two minutes before rebooting another mon.

You can tune the "mon_timecheck_skew_interval" which by default is set
to 30 seconds. See [1] and look for "timecheck" to find the different
options.

Gr. Stefan

[1]:
http://docs.ceph.com/docs/master/rados/configuration/mon-config-ref/

-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How do you deal with "clock skew detected"?

2019-05-16 Thread Jan Kasprzak
Konstantin Shalygin wrote:
: >how do you deal with the "clock skew detected" HEALTH_WARN message?
: >
: >I think the internal RTC in most x86 servers does have 1 second resolution
: >only, but Ceph skew limit is much smaller than that. So every time I reboot
: >one of my mons (for kernel upgrade or something), I have to wait for several
: >minutes for the system clock to synchronize over NTP, even though ntpd
: >has been running before reboot and was started during the system boot again.
: 
: Definitely you should use chrony with iburst.

OK, many responses (thanks for them!) suggest chrony, so I tried it:
With all three mons running chrony and being in sync with my NTP server
with offsets under 0.0001 second, I rebooted one of the mons:

There still was the HEALTH_WARN clock_skew message as soon as
the rebooted mon starts responding to ping. The cluster returns to
HEALTH_OK about 95 seconds later.

According to "ntpdate -q my.ntp.server", the initial offset
after reboot is about 0.6 s (which is the reason of HEALTH_WARN, I think),
but it gets under 0.0001 s in about 25 seconds. The remaining ~50 seconds
of HEALTH_WARN is inside Ceph, with mons being already synchronized.

So the result is that chrony indeed synchronizes faster,
but nevertheless I still have about 95 seconds of HEALTH_WARN "clock skew
detected".

I guess now the workaround now is to ignore the warning, and wait
for two minutes before rebooting another mon.

-Yenya

-- 
| Jan "Yenya" Kasprzak  |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
sir_clive> I hope you don't mind if I steal some of your ideas?
 laryross> As far as stealing... we call it sharing here.   --from rcgroups
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How do you deal with "clock skew detected"?

2019-05-15 Thread Konstantin Shalygin

how do you deal with the "clock skew detected" HEALTH_WARN message?

I think the internal RTC in most x86 servers does have 1 second resolution
only, but Ceph skew limit is much smaller than that. So every time I reboot
one of my mons (for kernel upgrade or something), I have to wait for several
minutes for the system clock to synchronize over NTP, even though ntpd
has been running before reboot and was started during the system boot again.


Definitely you should use chrony with iburst.



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How do you deal with "clock skew detected"?

2019-05-15 Thread Alexandre DERUMIER
since I'm using chrony instead ntpd/openntpd, I don't have clock skew anymore.
 
(chrony is really faster to resync)

- Mail original -
De: "Jan Kasprzak" 
À: "ceph-users" 
Envoyé: Mercredi 15 Mai 2019 13:47:57
Objet: [ceph-users] How do you deal with "clock skew detected"?

Hello, Ceph users, 

how do you deal with the "clock skew detected" HEALTH_WARN message? 

I think the internal RTC in most x86 servers does have 1 second resolution 
only, but Ceph skew limit is much smaller than that. So every time I reboot 
one of my mons (for kernel upgrade or something), I have to wait for several 
minutes for the system clock to synchronize over NTP, even though ntpd 
has been running before reboot and was started during the system boot again. 

Thanks, 

-Yenya 

-- 
| Jan "Yenya" Kasprzak  | 
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 | 
sir_clive> I hope you don't mind if I steal some of your ideas? 
laryross> As far as stealing... we call it sharing here. --from rcgroups 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How do you deal with "clock skew detected"?

2019-05-15 Thread EDH - Manuel Rios Fernandez
We setup 2 monitors as NTP server, and the other nodes are sync from monitors.

-Mensaje original-
De: ceph-users  En nombre de Richard Hesketh
Enviado el: miércoles, 15 de mayo de 2019 14:04
Para: ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] How do you deal with "clock skew detected"?

Another option would be adding a boot time script which uses ntpdate (or
something) to force an immediate sync with your timeservers before ntpd starts 
- this is actually suggested in ntpdate's man page!

Rich

On 15/05/2019 13:00, Marco Stuurman wrote:
> Hi Yenya,
> 
> You could try to synchronize the system clock to the hardware clock 
> before rebooting. Also try chrony, it catches up very fast.
> 
> 
> Kind regards,
> 
> Marco Stuurman
> 
> 
> Op wo 15 mei 2019 om 13:48 schreef Jan Kasprzak  <mailto:k...@fi.muni.cz>>
> 
> Hello, Ceph users,
> 
> how do you deal with the "clock skew detected" HEALTH_WARN message?
> 
> I think the internal RTC in most x86 servers does have 1 second resolution
> only, but Ceph skew limit is much smaller than that. So every time I 
> reboot
> one of my mons (for kernel upgrade or something), I have to wait for 
> several
> minutes for the system clock to synchronize over NTP, even though ntpd
> has been running before reboot and was started during the system
> boot again.
> 
> Thanks,
> 
> -Yenya


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How do you deal with "clock skew detected"?

2019-05-15 Thread Richard Hesketh
Another option would be adding a boot time script which uses ntpdate (or
something) to force an immediate sync with your timeservers before ntpd
starts - this is actually suggested in ntpdate's man page!

Rich

On 15/05/2019 13:00, Marco Stuurman wrote:
> Hi Yenya,
> 
> You could try to synchronize the system clock to the hardware clock
> before rebooting. Also try chrony, it catches up very fast.
> 
> 
> Kind regards,
> 
> Marco Stuurman
> 
> 
> Op wo 15 mei 2019 om 13:48 schreef Jan Kasprzak  >
> 
>         Hello, Ceph users,
> 
> how do you deal with the "clock skew detected" HEALTH_WARN message?
> 
> I think the internal RTC in most x86 servers does have 1 second resolution
> only, but Ceph skew limit is much smaller than that. So every time I 
> reboot
> one of my mons (for kernel upgrade or something), I have to wait for 
> several
> minutes for the system clock to synchronize over NTP, even though ntpd
> has been running before reboot and was started during the system
> boot again.
> 
> Thanks,
> 
> -Yenya



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How do you deal with "clock skew detected"?

2019-05-15 Thread Marco Stuurman
Hi Yenya,

You could try to synchronize the system clock to the hardware clock before
rebooting. Also try chrony, it catches up very fast.


Kind regards,

Marco Stuurman


Op wo 15 mei 2019 om 13:48 schreef Jan Kasprzak 

> Hello, Ceph users,
>
> how do you deal with the "clock skew detected" HEALTH_WARN message?
>
> I think the internal RTC in most x86 servers does have 1 second resolution
> only, but Ceph skew limit is much smaller than that. So every time I reboot
> one of my mons (for kernel upgrade or something), I have to wait for
> several
> minutes for the system clock to synchronize over NTP, even though ntpd
> has been running before reboot and was started during the system boot
> again.
>
> Thanks,
>
> -Yenya
>
> --
> | Jan "Yenya" Kasprzak 
> |
> | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5
> |
> sir_clive> I hope you don't mind if I steal some of your ideas?
>  laryross> As far as stealing... we call it sharing here.   --from rcgroups
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com