[chrony-users] chrony and ntpd xleave interoperability

2018-01-23 Thread FUSTE Emmanuel
Hello,

First, my apologies for the fingers crossing on chrony-dev when I tried 
to subscribe to chrony-users...

I'm doing some tests to replace ntpd by chrony on some servers groups.
Theses servers use a peer association with interleave option.

When I try to do the same with ntpd on one side and chrony on the other, 
things go bad.
At best, chrony got a working association with interleave status with 
very long response time.
On the ntpd side, the association never work. The chrony server never 
get the "reach" state and the reach counter is stuck a zero.

As soon as I remove  the xleave option on the ntpd side, all start 
immediately to work as expected.

ntpd :
peer y.y.y.y minpoll 5 maxpoll10 xleave
restrict y.y.y.y notrap nomodify noquery

chrony :
peer x.x.x.x xleave minpoll 5  maxpoll 10
allow x.x.x.0/24

Since yesterday, I had removed the xleave option on the ntpd side.
All was good on the two sides.
So I tried to reactivate the xleave option
-> Boom it works !!!

I restarted chrony
-> ntpd logged "revceive: KoD packet from 192.54.145.235 has a zero org 
or rec timestamp. Ignoring."
and four minute later "y.y.y.y 8613 83 unreacheable"
The previously working assoc is now dead.
No working assoc from chrony.

So I restarted ntpd
-> chrony start to see the other server (ntpdata) but never reach a good 
state.
-> ntpd does not reach the "reach" state.

remove the xleave from ntpd and restart
-> all is still stuck
restart chrony
->  ntpd start to see the chrony server, reach state increment, and 
reach a "backup" condition. All is good on the chrony side.

Re-add xleave option on ntpd side.
unreach counter increment, flash=1606 so packet_bogus...
on the chrony side, "Total valid RX" no longer increment...

I'm lost.

chrony 3.2
ntp-4.2.8p8, ntp-4.2.8p10

Could I normally expect xleave interoperability between chrony and ntpd 
or it is something too much "implementation specific" ?

Emmanuel.


Re: [chrony-users] chrony and ntpd xleave interoperability

2018-01-23 Thread FUSTE Emmanuel
Le 23/01/2018 à 16:58, Miroslav Lichvar a écrit :
> On Tue, Jan 23, 2018 at 02:44:56PM +0100, FUSTE Emmanuel wrote:
>> Le 23/01/2018 à 13:00, Miroslav Lichvar a écrit :
>>> With the current versions, if you can avoid the issue with
>>> unsynchronized sources, they should interoperate, at least when their
>>> polling intervals match. If it doesn't work for you, I'd like to see a
>>> tcpdump output.
>> Ok. I fixed min/max polling interval to 5 for testing purpose.
>> Then I first restarted chrony. Wait for it to sync on a online source.
>> Then restarted ntp and take capture.
>> Will send you all the datas
>>
>> NTP is stuck in unreachable state
>> Chrony is stuck with only one valid RX.
> Ok. I can reproduce this problem. It seems ntpd doesn't update its
> state in the interleaved mode when it receives a packet with an
> unexpected origin timestamp. There was a similar issue fixed for the
> basic mode few ntp releases ago:
> https://bugs.ntp.org/show_bug.cgi?id=2952
>
> As chronyd doesn't switch to the interleaved mode until it's receiving
> valid responses and ntpd doesn't accept responses in the basic mode,
> they are stuck waiting forever on each other.
>
> A similar thing seem to happen when trying to use the interleaved mode
> between two 4.2.8p10 ntpds. You said it worked for you before, so I
> assume one of the ntpds was an older version which didn't have this
> bug?
>
Here are data from the working 4.2.8p10 platform which is composed by 
w.w.w.w, y.y.y.y, z.z.z.z

ind assid status  conf reach auth condition  last_event cnt
===
   1 29450  f414   yes   yes   ok  candidate   reachable  1
   2 29451  f414   yes   yes   ok  candidate   reachable  1
   3 29452  f31f   yes   yes   ok    outlier  1
   4 29453  961a   yes   yes  none  sys.peer    sys_peer  1
   5 29454  931d   yes   yes  none   outlier  1
ntpq> lpe
  remote   refid  st t when poll reach   delay offset  
jitter
==
+x.x.x.x             .MRS.    1 u    5    8  377    0.363 
0.038   0.030
+y.y.y.y              .PTP0.   1 s   25   64  377 0.071    
0.017   0.035
-z.z.z.z              .PTP0.   1 s   45   64  376 0.058    
0.041   0.044
*SHM(0)  .PTP0.   0 l    2    8  377    0.000 -0.017   0.005
-ntp-gps-1.thale .GPS.    1 u    4    8  377    5.031 -0.435   0.020
ntpq> rv 29451
associd=29451 status=f414 conf, authenb, auth, reach, sel_candidate, 1 
event, reachable,
srcadr=y.y.y.y, srcport=123, dstadr=w.w.w.w,
dstport=123, leap=00, stratum=1, precision=-23, rootdelay=0.000,
rootdisp=1.099, refid=PTP0,
reftime=de11e3d4.1850d73b  Tue, Jan 23 2018 17:39:48.094,
rec=de11e3db.18563cd1  Tue, Jan 23 2018 17:39:55.095, reach=376,
unreach=0, hmode=1, pmode=1, hpoll=6, ppoll=6, headway=51, flash=00 ok,
keyid=112, offset=0.017, delay=0.071, dispersion=1.719, jitter=0.035,
xleave=0.024,
filtdelay= 0.09    0.10    0.07    0.12    0.13    0.11 0.11    0.16,
filtoffset=   -0.01   -0.02    0.02    0.06    0.05   -0.01 -0.04    0.00,
filtdisp=  0.00    0.96    1.95    2.94    3.90    4.89 5.88    6.86
ntpq> rv 29452
associd=29452 status=f31f conf, authenb, auth, reach, sel_outlier, 1 
event, interleave_error,
srcadr=z.z.z.z, srcport=123, dstadr=w.w.w.w,
dstport=123, leap=00, stratum=1, precision=-23, rootdelay=0.000,
rootdisp=1.099, refid=PTP0,
reftime=de11e4c0.a5c3751c  Tue, Jan 23 2018 17:43:44.647,
rec=de11e4c7.a5ca043a  Tue, Jan 23 2018 17:43:51.647, reach=377,
unreach=0, hmode=1, pmode=1, hpoll=6, ppoll=6, headway=13, flash=00 ok,
keyid=113, offset=0.041, delay=0.058, dispersion=5.542, jitter=0.062,
xleave=0.014,
filtdelay= 0.11    0.14    0.11    0.11    0.10    0.08 0.06    0.08,
filtoffset=    0.03   -0.05   -0.02   -0.02   -0.03   -0.02 0.04    0.09,
filtdisp=  0.00    0.98    1.92    2.87    3.84    4.83 5.78    6.75

Emmanuel.

Re: [chrony-users] chrony and ntpd xleave interoperability

2018-01-23 Thread FUSTE Emmanuel
Le 23/01/2018 à 13:00, Miroslav Lichvar a écrit :
> On Tue, Jan 23, 2018 at 11:31:38AM +0100, FUSTE Emmanuel wrote:
>> When I try to do the same with ntpd on one side and chrony on the other,
>> things go bad.
>> At best, chrony got a working association with interleave status with
>> very long response time.
> A long response time up to the polling interval of the peer is normal
> in symmetric associations.
>
>> On the ntpd side, the association never work. The chrony server never
>> get the "reach" state and the reach counter is stuck a zero.
> Have you tried the same configuration and the timing of restarts,
> between two ntpd servers? I suspect you would see some of the issues
> in this case too.
>
> There are probably multiple issues involved, which make it difficult
> to see what's going on. I'm aware of the following:
>
> - ntpd doesn't accept packets from peers that are not synchronized
>(yet), so peers have to be configured with other sources in order
>for the symmetric association (in both basic and interleaved modes)
>to start. See https://bugs.ntp.org/show_bug.cgi?id=3445.
> - interleaved mode in ntpd works only when the peers use the same
>polling interval. If they have the same minpoll and maxpoll, but
>minpoll != maxpoll, they should in theory both get to the maxpoll
>if the association doesn't work, but there may be a bug that
>prevents that.
> - chrony switches to the basic mode when the polling intervals don't
>match, but ntpd doesn't accept responses in the basic mode if the
>interleaved mode is enabled
>
>> chrony 3.2
>> ntp-4.2.8p8, ntp-4.2.8p10
>>
>> Could I normally expect xleave interoperability between chrony and ntpd
>> or it is something too much "implementation specific" ?
> With the current versions, if you can avoid the issue with
> unsynchronized sources, they should interoperate, at least when their
> polling intervals match. If it doesn't work for you, I'd like to see a
> tcpdump output.
Ok. I fixed min/max polling interval to 5 for testing purpose.
Then I first restarted chrony. Wait for it to sync on a online source.
Then restarted ntp and take capture.
Will send you all the datas

NTP is stuck in unreachable state
Chrony is stuck with only one valid RX.
>
> Please note that the symmetric mode has some security issues and it's
> generally recommended to use the client/server mode instead. Even if
> authentication is enabled, it is possible to break a symmetric
> association by replaying old packets. (chrony has a partial protection
> against this attack, but it works only in the basic mode when the
> polling intervals match and there are no packets with timestamps from
> future that could be replayed. It's too fragile, don't rely on it!)
Yes I know. It is only used on "trusted" lan segments and/or to try to 
inter-operate with ntpd xleave.
>
> It is possible that support for symmetric associations will be dropped
> from chrony in future.
>
I only using it to transition from ntpd to chrony. So It will not be missed.
I hope my clock vendor will sometime transition from ntpd to something 
else (chrony) to get good xleave support (and much more).
At most, I mainly use theses clocks with PTP so the NTP part only affect 
fail-over scenarios.

Emmanuel.


Re: [chrony-users] chrony and ntpd xleave interoperability

2018-01-24 Thread FUSTE Emmanuel
Le 24/01/2018 à 13:45, Miroslav Lichvar a écrit :
> On Tue, Jan 23, 2018 at 05:42:22PM +0100, FUSTE Emmanuel wrote:
>> Le 23/01/2018 à 16:58, Miroslav Lichvar a écrit :
>>> A similar thing seem to happen when trying to use the interleaved mode
>>> between two 4.2.8p10 ntpds. You said it worked for you before, so I
>>> assume one of the ntpds was an older version which didn't have this
>>> bug?
>> I have a platform with tree ntpds in interleaved mode
>> Was on 2.4.8p8.
>> Were upgraded today to 2.4.8p10 and are still working properly.
> You are right. My test was bad (it hit the bug with unsynchronized
> source).
>
> The bug in the interleaved mode is a bit more subtle. The state is
> updated from received packet, but only when one of the timestamps is
> zero (i.e. it's the first packet of the association). This means two
> ntpd 4.2.8p10 can interoperate, but I suspect the association will not
> recover if there is a mismatch between the receive timestamps.
>
> I'll send a bug report to the ntp maintainers.
>
> In the meantime, if you are willing to patch ntp, this should fix it:
>
> diff -up ntp-4.2.8p10/ntpd/ntp_proto.c.orig ntp-4.2.8p10/ntpd/ntp_proto.c
> --- ntp-4.2.8p10/ntpd/ntp_proto.c.orig2018-01-24 13:35:16.611488502 
> +0100
> +++ ntp-4.2.8p10/ntpd/ntp_proto.c 2018-01-24 13:35:24.113505866 +0100
> @@ -1774,7 +1774,6 @@ receive(
>   peer->bogusorg++;
>   peer->flags |= FLAG_XBOGUS;
>   peer->flash |= TEST2;   /* bogus */
> - return; /* Bogus packet, we are done */
>   }
>   
Yes it work !

Thank you.
Emmanuel.


Re: [chrony-users] NTP bogus timestamps - Chrony on openSUSE 15.1

2019-08-21 Thread FUSTE Emmanuel
Le 21/08/2019 à 16:00, James Knott a écrit :
> On 2019-08-21 09:44 AM, Miroslav Lichvar wrote:
>> It has no impact on accuracy.
> Maybe not on my local network, but what if the server was some distance
> away?  I realize NTP was developed back in the days when a 56 Kb/s
> connection was really something, but even with today's high bandwidth
> connections there is some latency that would cause the client to be
> slightly behind the server.  The calculations based on those time stamps
> were meant to determine that latency and correct for it.
>
> Incidentally, at work a few months ago, there was some discussion about
> NTP on a major LRT project I was working on, though I wasn't directly
> involved with the NTP servers.  On this system, they have 2 GPS/NTP
> servers, at different locations, that were to be synced with 2 other
> servers.  This system runs over a fibre backbone, that's 11 Km long and
> they're somewhat fussy about NTP.  I had to explain, to one of my
> co-workers, how NTP worked.
>
Please, read the spec.
It is not used as you think. It has NO impact on the way the 
calculations are done so no impact on accuracy.

Emmanuel.

Re: [chrony-users] Resume from suspend and default makestep configuration

2020-05-18 Thread FUSTE Emmanuel
Hello Pali,

Le 18/05/2020 à 12:37, Pali Rohár a écrit :
> The main problem is when system is put into suspend or hibernate state.
>
> In my opinion resuming from suspend / hibernate state should be handled
> in the same way as (re)starting chronyd. You do not know what may
> happened during sleep.
Yes and in case of needed workaround, it should be done at the system 
level, not chrony.
A job for systemd.
> And as I pointed there are existing problems that UEFI/BIOS firmware
> changes RTC clock without good reason which results in completely wrong
> system clock.
>
Could well be identified by blacklist at the udev/systemd level for 
applying or not the workaround (restart chrony or launch a chronyc 
command at resume)

Emmanuel.

Re: [chrony-users] Resume from suspend and default makestep configuration

2020-05-18 Thread FUSTE Emmanuel
Le 18/05/2020 à 13:15, Pali Rohár a écrit :
> On Monday 18 May 2020 10:45:02 FUSTE Emmanuel wrote:
>> Hello Pali,
>>
>> Le 18/05/2020 à 12:37, Pali Rohár a écrit :
>>> The main problem is when system is put into suspend or hibernate state.
>>>
>>> In my opinion resuming from suspend / hibernate state should be handled
>>> in the same way as (re)starting chronyd. You do not know what may
>>> happened during sleep.
>> Yes and in case of needed workaround, it should be done at the system
>> level, not chrony.
>> A job for systemd.
> Hello! Sorry for a stupid question, but what has systemd in common with
> chronyd? Why should systemd care about chronyd time synchronization?
Nothing.
But it is to your "process manager" being systemd, sysvinit pile of 
scripts or whatever to restart or notify chrony, it has do do 
housekeeping anyway for other things when you suspend/resume.
Exactly as networkmanager, ifupdown scripts, systemd-networkd 
reload/restart some network services when interfaces/tunnels/vpn are 
upped/downed.
>>> And as I pointed there are existing problems that UEFI/BIOS firmware
>>> changes RTC clock without good reason which results in completely wrong
>>> system clock.
>>>
>> Could well be identified by blacklist at the udev/systemd level for
>> applying or not the workaround (restart chrony or launch a chronyc
>> command at resume)
> Could you describe in details what do you mean by blacklist? Which udev
> blacklist you mean and what should be put into that blacklist? I have
> not caught this part.
Faulty systems could be identified by DMI/ACPI strings and quirk applied.
See for example /lib/udev/hwdb.d/60-sensor.hwdb  for some laptop sensors.
We could add an attribute to the RTC if it matche some vendor/bios 
version/model etc... to put in the hwdb (the blacklist)
A udev rule will assign this attribute to the RTC if you are running on 
a known buggy system.
A script could do anything you want at suspend/resume time in 
/lib/systemd/system-sleep if your RTC has the offended attribute (see 
systemd-sleep man page).
Or better, a unit run at resume time could do anything too.
The hwdb abstraction is not need if it is a local hack and should be 
properly defined with the hwdb/udev/systemd developers.

If raised to the systemd developers, systemd-sleep / resume could take 
care directly and fire an appropriate target with a formally defined 
attribute in the hwdb.
What to do with this target could be configurable and default to time 
daemon restart.
I'm not a systemd/udev/hwdb expert/develloper, but I think this is a 
good track and deserve a discussion with them.

Anyway, the level to tackle the problem is not chrony and the proper 
level for managing the problem is the init/process manager. Hwdb/udev is 
"a" way to share the faulty systems information across "init" ecosystem. 
Information that is usefully not only for chrony.

Emmanuel.


Re: [chrony-users] Resume from suspend and default makestep configuration

2020-05-19 Thread FUSTE Emmanuel
Le 19/05/2020 à 12:29, Pali Rohár a écrit :
> On Monday 18 May 2020 13:45:04 FUSTE Emmanuel wrote:
>> Le 18/05/2020 à 13:15, Pali Rohár a écrit :
>>> On Monday 18 May 2020 10:45:02 FUSTE Emmanuel wrote:
>>>> Hello Pali,
>>>>
>>>> Le 18/05/2020 à 12:37, Pali Rohár a écrit :
>>>>> The main problem is when system is put into suspend or hibernate state.
>>>>>
>>>>> In my opinion resuming from suspend / hibernate state should be handled
>>>>> in the same way as (re)starting chronyd. You do not know what may
>>>>> happened during sleep.
>>>> Yes and in case of needed workaround, it should be done at the system
>>>> level, not chrony.
>>>> A job for systemd.
>>> Hello! Sorry for a stupid question, but what has systemd in common with
>>> chronyd? Why should systemd care about chronyd time synchronization?
>> Nothing.
>> But it is to your "process manager" being systemd, sysvinit pile of
>> scripts or whatever to restart or notify chrony, it has do do
>> housekeeping anyway for other things when you suspend/resume.
> Hm... I remember that in past it was needed to blacklist broken daemons,
> software and kernel modules which did not work correctly during S3 or
> hibernate state. It was in some pm scripts utils...
>
> But I thought that these days are already passed and software can deal
> with fact that machine may be put into suspend or hibernate state.
>
> So what you are suggesting is to put chronyd daemon into list of broken
> software (which needs to be stopped prior suspend / resume)?
>
> It does not make sense for me as the immediate step after putting
> software or kernel module into such "blacklist" was to inform upstream
> authors of that daemon or kernel module they it is broken / incompatible
> with suspend state and it should be fixed.
>
> That "blacklist" was just workaround for buggy software and not
> permanent solution.
No not chrony, but the machine which change RTC on your back : buggy Bios
>
>> Exactly as networkmanager, ifupdown scripts, systemd-networkd
>> reload/restart some network services when interfaces/tunnels/vpn are
>> upped/downed.
> This is something totally different. all those mentioned "services" are
> just independent part of system which manages network connections.
>
> chronyd is there to manage time synchronization.
It was an "imaged comparison" for event driven config change.
The event in the suspend vs time case,  the event is only know and 
should be managed by your init system not by your time daemon.

>
>>>>> And as I pointed there are existing problems that UEFI/BIOS firmware
>>>>> changes RTC clock without good reason which results in completely wrong
>>>>> system clock.
>>>>>
>>>> Could well be identified by blacklist at the udev/systemd level for
>>>> applying or not the workaround (restart chrony or launch a chronyc
>>>> command at resume)
>>> Could you describe in details what do you mean by blacklist? Which udev
>>> blacklist you mean and what should be put into that blacklist? I have
>>> not caught this part.
>> Faulty systems could be identified by DMI/ACPI strings and quirk applied.
> And what is the faulty system?
Citing yourself :

"as I pointed there are existing problems that UEFI/BIOS firmware
changes RTC clock without good reason"


>
> I think this is something general and not related to particular machine.
> I guess under specific conditions it may happen on any system.
>
>> See for example /lib/udev/hwdb.d/60-sensor.hwdb  for some laptop sensors.
>> We could add an attribute to the RTC if it matche some vendor/bios
>> version/model etc... to put in the hwdb (the blacklist)
>> A udev rule will assign this attribute to the RTC if you are running on
>> a known buggy system.
>> A script could do anything you want at suspend/resume time in
>> /lib/systemd/system-sleep if your RTC has the offended attribute (see
>> systemd-sleep man page).
>> Or better, a unit run at resume time could do anything too.
>> The hwdb abstraction is not need if it is a local hack and should be
>> properly defined with the hwdb/udev/systemd developers.
> This database is for describing hardware differences or issues.
>
> But above problem with time synchronization is general and hardware
> independent. You can simulate same issue on your machine.
>
> Just put your computer into hibernation. Then boot from liveUSB some
> Linxu distribution and change RTC time. Turn off liveUSB and boot your
> hibern

Re: [chrony-users] Resume from suspend and default makestep configuration

2020-05-19 Thread FUSTE Emmanuel
Le 19/05/2020 à 13:30, Pali Rohár a écrit :
> On Tuesday 19 May 2020 11:10:01 FUSTE Emmanuel wrote:
>> Le 19/05/2020 à 12:29, Pali Rohár a écrit :
>>> On Monday 18 May 2020 13:45:04 FUSTE Emmanuel wrote:
>>>> Le 18/05/2020 à 13:15, Pali Rohár a écrit :
>>>>> On Monday 18 May 2020 10:45:02 FUSTE Emmanuel wrote:
>>>>>> Hello Pali,
>>>>>>
>>>>>> Le 18/05/2020 à 12:37, Pali Rohár a écrit :
>>>>>>> The main problem is when system is put into suspend or hibernate state.
>>>>>>>
>>>>>>> In my opinion resuming from suspend / hibernate state should be handled
>>>>>>> in the same way as (re)starting chronyd. You do not know what may
>>>>>>> happened during sleep.
>>>>>> Yes and in case of needed workaround, it should be done at the system
>>>>>> level, not chrony.
>>>>>> A job for systemd.
>>>>> Hello! Sorry for a stupid question, but what has systemd in common with
>>>>> chronyd? Why should systemd care about chronyd time synchronization?
>>>> Nothing.
>>>> But it is to your "process manager" being systemd, sysvinit pile of
>>>> scripts or whatever to restart or notify chrony, it has do do
>>>> housekeeping anyway for other things when you suspend/resume.
>>> Hm... I remember that in past it was needed to blacklist broken daemons,
>>> software and kernel modules which did not work correctly during S3 or
>>> hibernate state. It was in some pm scripts utils...
>>>
>>> But I thought that these days are already passed and software can deal
>>> with fact that machine may be put into suspend or hibernate state.
>>>
>>> So what you are suggesting is to put chronyd daemon into list of broken
>>> software (which needs to be stopped prior suspend / resume)?
>>>
>>> It does not make sense for me as the immediate step after putting
>>> software or kernel module into such "blacklist" was to inform upstream
>>> authors of that daemon or kernel module they it is broken / incompatible
>>> with suspend state and it should be fixed.
>>>
>>> That "blacklist" was just workaround for buggy software and not
>>> permanent solution.
>> No not chrony, but the machine which change RTC on your back : buggy Bios
> Sorry, but I have not caught this line. Blacklist contained list of
> buggy software, daemons and kernel modules which had to be (in past)
> stopped / unloaded prior system went to S3 and started / (re)loaded
> after system resumed. So obviously putting "buggy Bios" into blacklist
> not only does not make sense, but also it did nothing. In that
> particular case chronyd had to be put into that blacklist of buggy
> software as it as you described is chronyd which needs to be stopped /
> started... But as I said this was used in past when buggy software and
> kernel modules were there when they was not able to correctly handle S3
> state.
I said the machine not chrony.
Please I'm not native English, but this conversation became more and 
more like a trooling one.
Blacklist are black list, this is a generic term as you point out.

>
>>>> Exactly as networkmanager, ifupdown scripts, systemd-networkd
>>>> reload/restart some network services when interfaces/tunnels/vpn are
>>>> upped/downed.
>>> This is something totally different. all those mentioned "services" are
>>> just independent part of system which manages network connections.
>>>
>>> chronyd is there to manage time synchronization.
>> It was an "imaged comparison" for event driven config change.
>> The event in the suspend vs time case,  the event is only know and
>> should be managed by your init system not by your time daemon.
>>
>>>>>>> And as I pointed there are existing problems that UEFI/BIOS firmware
>>>>>>> changes RTC clock without good reason which results in completely wrong
>>>>>>> system clock.
>>>>>>>
>>>>>> Could well be identified by blacklist at the udev/systemd level for
>>>>>> applying or not the workaround (restart chrony or launch a chronyc
>>>>>> command at resume)
>>>>> Could you describe in details what do you mean by blacklist? Which udev
>>>>> blacklist you mean and what should be put into that blacklist? I have
>>>>> not caught this part.
>>>> Faulty systems could be identified by D

Re: [chrony-users] Resume from suspend and default makestep configuration

2020-05-19 Thread FUSTE Emmanuel
Le 19/05/2020 à 15:11, Pali Rohár a écrit :
> On Tuesday 19 May 2020 12:42:28 FUSTE Emmanuel wrote:
>> Le 19/05/2020 à 13:30, Pali Rohár a écrit :
>>> On Tuesday 19 May 2020 11:10:01 FUSTE Emmanuel wrote:
>>>> Le 19/05/2020 à 12:29, Pali Rohár a écrit :
>>>>> On Monday 18 May 2020 13:45:04 FUSTE Emmanuel wrote:
>>>>>> Le 18/05/2020 à 13:15, Pali Rohár a écrit :
>>>>>>> On Monday 18 May 2020 10:45:02 FUSTE Emmanuel wrote:
>>>>>>>> Hello Pali,
>>>>>>>>
>>>>>>>> Le 18/05/2020 à 12:37, Pali Rohár a écrit :
>>>>>>>>> The main problem is when system is put into suspend or hibernate 
>>>>>>>>> state.
>>>>>>>>>
>>>>>>>>> In my opinion resuming from suspend / hibernate state should be 
>>>>>>>>> handled
>>>>>>>>> in the same way as (re)starting chronyd. You do not know what may
>>>>>>>>> happened during sleep.
>>>>>>>> Yes and in case of needed workaround, it should be done at the system
>>>>>>>> level, not chrony.
>>>>>>>> A job for systemd.
>>>>>>> Hello! Sorry for a stupid question, but what has systemd in common with
>>>>>>> chronyd? Why should systemd care about chronyd time synchronization?
>>>>>> Nothing.
>>>>>> But it is to your "process manager" being systemd, sysvinit pile of
>>>>>> scripts or whatever to restart or notify chrony, it has do do
>>>>>> housekeeping anyway for other things when you suspend/resume.
>>>>> Hm... I remember that in past it was needed to blacklist broken daemons,
>>>>> software and kernel modules which did not work correctly during S3 or
>>>>> hibernate state. It was in some pm scripts utils...
>>>>>
>>>>> But I thought that these days are already passed and software can deal
>>>>> with fact that machine may be put into suspend or hibernate state.
>>>>>
>>>>> So what you are suggesting is to put chronyd daemon into list of broken
>>>>> software (which needs to be stopped prior suspend / resume)?
>>>>>
>>>>> It does not make sense for me as the immediate step after putting
>>>>> software or kernel module into such "blacklist" was to inform upstream
>>>>> authors of that daemon or kernel module they it is broken / incompatible
>>>>> with suspend state and it should be fixed.
>>>>>
>>>>> That "blacklist" was just workaround for buggy software and not
>>>>> permanent solution.
>>>> No not chrony, but the machine which change RTC on your back : buggy Bios
>>> Sorry, but I have not caught this line. Blacklist contained list of
>>> buggy software, daemons and kernel modules which had to be (in past)
>>> stopped / unloaded prior system went to S3 and started / (re)loaded
>>> after system resumed. So obviously putting "buggy Bios" into blacklist
>>> not only does not make sense, but also it did nothing. In that
>>> particular case chronyd had to be put into that blacklist of buggy
>>> software as it as you described is chronyd which needs to be stopped /
>>> started... But as I said this was used in past when buggy software and
>>> kernel modules were there when they was not able to correctly handle S3
>>> state.
>> I said the machine not chrony.
>> Please I'm not native English, but this conversation became more and
>> more like a trooling one.
>> Blacklist are black list, this is a generic term as you point out.
> Sorry for that. Lets call it just list. If you want to somehow use
> machine in that list, then you probably need tuple 
> and teach scripts around to read that list as tuple and restart
> "software" if "machine" matches string of current machine on which it is
> running.
Yes and software in this case is "software that provide time sync"
>
> I'm saying that in past this was just list of "buggy" software and
> kernel modules which needs to be restarted during S3. It was not some
> smart structure where you was able to define rules like "if you are
> running on machine ABC then restart software CDE". And this is I guess
> what you want to achieve by putting machine on list.
>
>>>>>> Exactly as networkmanager, ifupdown scripts

Re: [chrony-users] Resume from suspend and default makestep configuration

2020-05-19 Thread FUSTE Emmanuel
Le 19/05/2020 à 17:54, Pali Rohár a écrit :
> On Tuesday 19 May 2020 17:36:15 Miroslav Lichvar wrote:
>> On Tue, May 19, 2020 at 03:11:42PM +0200, Pali Rohár wrote:
>>> Also when resuming from hibernation you may have been completely powered
>>> off and also memory of system may have been modified. Plus multiOS
>>> scenario may have applied, e.g. ordinary user just "booted" windows and
>>> then turned it off and resumed linux from hibernation. I guess we would
>>> agree that ordinary user does not use any virtualisation as you
>>> described below.
>> I don't think that's a common practice. If you suspend an OS and boot
>> another, all kind of things can break, like corrupted swaps, etc. If
>> you know what you are doing, fine, but don't be surprised when things
>> break.
> I know that lot of people are doing it. They are not developers,
> sysadmins or people who watch mailing list, ... just normal users.
> So from my observation, this is common. Maybe it is less common by
> developers who know what can happen and break, but not uncommon by
> ordinary non-power users.
>
> When hibernating windows it puts special signature on NTFS filesystems
> and Linux's ntfs-fuse refuse to mount in R/W mode such "hibernated" NTFS
> filesystem. So there is no corruption of hibernated windows state.
>
> Windows does not support accessing ext4, btrfs or linux swap so there is
> corruption of linux fs/swap from windows.
>
>> When chronyd is running, it assumes it has full control over the
>> system clock. When you suspend and resume the OS or machine, the
>> system clock is reset to the RTC. chronyd can see there was a forward
>> jump, but it doesn't know what happened. systemd should know that and
>> there could be a unit to call the chronyc reset and makestep commands
>> if a significant offset is expected.
> But systemd cannot know that. It is chronyd who see that significant
Systemd know that you are resuming from suspend.
> jump occurred and only after it synchronize time via NTP. And until NTP
Wrong. Chrony does not need to sync via NTP to see that the system clock 
jumped.

> daemon tell (somehow) hat this jumps occurred, systemd cannot know that
> during hibernation RTC clock was modified.
That should not happen in normal case.
>
> This looks like a chicken and egg problem. systemd (or any other init /
> service system) does not know correct time after resuming system from
> suspend/hibernate, so it cannot check if RTC jump occurred. chronyd is
RTC should be the trusted source in this case so init system, knowing 
that you are resuming from should notify ntp daemon that all is ok
(launch "chronyc reset" in the devel version).

After that, on a "sane" computer, you could even now drop the makestep 
parameter for the paranoids like me.


Re: [chrony-users] Resume from suspend and default makestep configuration

2020-05-19 Thread FUSTE Emmanuel
Le 19/05/2020 à 16:11, Pali Rohár a écrit :
> On Tuesday 19 May 2020 13:40:18 FUSTE Emmanuel wrote:
>> Le 19/05/2020 à 15:11, Pali Rohár a écrit :
>>
>>> In past I lot of time seen problem that Windows stored system time in
>>> local timezone to RTC, then computer was rebooted to Linux which reads
>>> system time from RTC in UTC and saw incorrect time. Installing NTP
>>> daemon fixed this problem. And then after reboot Windows time was
>>> shifted and after few seconds/minutes it synchronized it again against
>>> Windows time server.
>> A better workaround: just intruct linux that RTC is in localtimezone and
>> not UTC and it would have worked.
> I remember that this setup did not work in one case: when linux system
> was booted prior booting windows system after DST change. Time was
> shifted two times, once by linux, once by windows as windows did not
> know that it should do it...
> Emmanuel.
Yes , the problem of shadow states. Only one could control the RTC, 
simultaneously (VM) or asynchronously (Multiboot).
Your hibernation image is a shadow state too.

Emmanuel.

Re: [chrony-users] NTS: Limiting

2021-01-20 Thread FUSTE Emmanuel
Le 20/01/2021 à 10:03, Karol Babioch a écrit :
> Hi,
>
> Am 19.01.21 um 19:02 schrieb Kurt Roeckx:
>> In your config file
>> you need to say something like "server ntp.example.org nts". This
>> means you will only accept certificates that have ntp.example.org
>> in the certificate. If you only trust Let's encrypt, you will only
>> trust certificates issued by Let's encrypt for ntp.example.org.
> Yes, that is correct when you specify servers explicitly.
>
>> I have no idea what kind of attack surface you have in mind.
> I'm wondering how this behaves in case of pools, i.e. when I run a
> private pool of NTP servers, i.e. "pool.example.com".
>
> When I have something like this in my chrony.conf:
>
>> pool pool.example.com iburst maxsources 3
> Is NTS even possible in such a context? AFAIK only A records with IP
> addresses are resolved, so I'm not sure if and how certificates can be
> validated.
There is no NTS for the pool for now. Some technical pieces are missing 
and need to be defined/specified.
There is some propositions for a SRV record usage for NTP/NTS, but any 
projection is premature.
So the problem you try to solve does not exist now: you always specify 
server explicitly in a NTS context.

Emmanuel.N�r��y隊W!���ǫ�-r�+n��\��
"�r��z)��.n7��Z+��izf���k�|�z�\��'�۱}���*+�����)��.n7��:蹹^f��X��f���܆�'�۱}���*+