Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-04-24 Thread Miroslav Lichvar
On Wed, Apr 23, 2014 at 09:22:45PM -0700, John Stultz wrote:
> On 02/12/2014 07:42 AM, Miroslav Lichvar wrote:
> > You can see in this test it takes about 2500 updates to correct the
> > initial ntp error and settle down. That's with 1GHz clocksource. In
> > some tests I did with smaller clock frequencies or different frequency
> > offsets it took much longer than that.
> 
> So I started to look into this slow update issue. I was a little
> confused, as the logarithmic approximation done in the frequency
> correction shouldn't let *that* much initial error accumulate before get
> to the +1/0 adjustments.
> 
> It ends up this is more a reflection of a different part of your patch.
> Particularly the tk->ntp_tick storage. Bascially the ntp_tick variable
> is a cache of the ntp_tick_length() value, however it doesn't get set to
> ntp_tick_length() until *after* you do the first frequency correction.
> Basically this avoids accumulating any error until after the first
> correction is made.

Yes, that was one of the main points of the patch. Postponing the tick
length change should remove the biggest source of the ntp error.

> My main concern is that this seems like an accounting error. By
> basically avoiding accumulating the initial error it seems it would
> never be corrected, no?

Not if we don't consider it to be something that should be corrected.
When the ntp tick length is changed (by adjtimex call or on second
overflow), to which ticks should be this change applied? The current
code always accumulates ntp error with the current tick length, i.e.
the change is effectively applied to already passed ticks since last
accumulation. I'm not saying this is necessarily wrong, but it causes
large ntp errors. I'm proposing to look at the frequency change
in a different way and apply it at the current time in the current
tick when the clock is updated.

In my understanding, there are three sources of ntp error in the
current code:
- change in the tick length is not effectively applied at the current
  time in the clock update
- mult is controlled by an iterative method
- insufficient resolution of mult

I think the first source can be removed by postpoing the tick length
change as explained above. The second source can be removed by
calculating mult precisely with division instead of an iterative
method. There is probably nothing we can do about the third source
(except switching to 64-bit mult), but it's small, predictable and can
be handled easily and cheaply by the +1/0 mult adjustment.

I'm still not convinced the clock can be controlled quickly and
accurately without information about when will be the next clock
update if the first and possibly second source of ntp error remain
there. As you have probably seen when working on the patches, the
requirements are in conflict and it's difficult or maybe not even
possible to get something working well with all different update
intervals, clock multipliers and frequency changes.

>From my view, as someone involved in development of algorithms
controlling clocks, I'd like the clock to be as deterministic as
possible. When I set the frequency, the kernel shouldn't be correcting
some large unknown phase error behind my back. I still wouldn't know
when exactly was the frequency actually set, but if that information
was exported by adjtimex (I have some ideas how to do that), it would
be perfect for me.

Thanks for still working on this.

-- 
Miroslav Lichvar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-04-24 Thread Miroslav Lichvar
On Wed, Apr 23, 2014 at 09:22:45PM -0700, John Stultz wrote:
 On 02/12/2014 07:42 AM, Miroslav Lichvar wrote:
  You can see in this test it takes about 2500 updates to correct the
  initial ntp error and settle down. That's with 1GHz clocksource. In
  some tests I did with smaller clock frequencies or different frequency
  offsets it took much longer than that.
 
 So I started to look into this slow update issue. I was a little
 confused, as the logarithmic approximation done in the frequency
 correction shouldn't let *that* much initial error accumulate before get
 to the +1/0 adjustments.
 
 It ends up this is more a reflection of a different part of your patch.
 Particularly the tk-ntp_tick storage. Bascially the ntp_tick variable
 is a cache of the ntp_tick_length() value, however it doesn't get set to
 ntp_tick_length() until *after* you do the first frequency correction.
 Basically this avoids accumulating any error until after the first
 correction is made.

Yes, that was one of the main points of the patch. Postponing the tick
length change should remove the biggest source of the ntp error.

 My main concern is that this seems like an accounting error. By
 basically avoiding accumulating the initial error it seems it would
 never be corrected, no?

Not if we don't consider it to be something that should be corrected.
When the ntp tick length is changed (by adjtimex call or on second
overflow), to which ticks should be this change applied? The current
code always accumulates ntp error with the current tick length, i.e.
the change is effectively applied to already passed ticks since last
accumulation. I'm not saying this is necessarily wrong, but it causes
large ntp errors. I'm proposing to look at the frequency change
in a different way and apply it at the current time in the current
tick when the clock is updated.

In my understanding, there are three sources of ntp error in the
current code:
- change in the tick length is not effectively applied at the current
  time in the clock update
- mult is controlled by an iterative method
- insufficient resolution of mult

I think the first source can be removed by postpoing the tick length
change as explained above. The second source can be removed by
calculating mult precisely with division instead of an iterative
method. There is probably nothing we can do about the third source
(except switching to 64-bit mult), but it's small, predictable and can
be handled easily and cheaply by the +1/0 mult adjustment.

I'm still not convinced the clock can be controlled quickly and
accurately without information about when will be the next clock
update if the first and possibly second source of ntp error remain
there. As you have probably seen when working on the patches, the
requirements are in conflict and it's difficult or maybe not even
possible to get something working well with all different update
intervals, clock multipliers and frequency changes.

From my view, as someone involved in development of algorithms
controlling clocks, I'd like the clock to be as deterministic as
possible. When I set the frequency, the kernel shouldn't be correcting
some large unknown phase error behind my back. I still wouldn't know
when exactly was the frequency actually set, but if that information
was exported by adjtimex (I have some ideas how to do that), it would
be perfect for me.

Thanks for still working on this.

-- 
Miroslav Lichvar
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-04-23 Thread John Stultz
Hey Miroslav!

  Once again, a few months pass and I finally get some more time to look
at this. :( Sorry for how slow this has been going.


On 02/12/2014 07:42 AM, Miroslav Lichvar wrote:
> On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
>> Got a few cycles to take another look at this, and tried to address
>> Miroslav's latest comments. Please let me know if you have further
>> thoughts!
> In the simulations this version of the patch does indeed work better
> than the previous one. I tested it with the ntp_error correction added
> back as per your previous mail. It converges, the measured frequency
> matches the value set by adjtimex and the ntp error stays close to
> zero. I'm concerned about two things now and I'm not sure if they can
> be fixed easily.
>
> One is that the convergence still seems too slow to me.
>
> ./tk_test -t 500 -n 4000
> samples: 1-500 reg: 1-500 slope: 1.00 dev: 1947.5 max: 36940.9 freq: 
> -100.08081
> samples: 501-1000 reg: 501-1000 slope: 1.00 dev: 430.7 max: 1669.6 freq: 
> -100.07168
> samples: 1001-1500 reg: 1001-1500 slope: 1.00 dev: 675.3 max: 3172.9 freq: 
> -100.07393
> samples: 1501-2000 reg: 1501-2000 slope: 1.00 dev: 453.9 max: 2223.4 freq: 
> -100.07177
> samples: 2001-2500 reg: 2001-2500 slope: 1.00 dev: 2601.2 max: 10875.4 freq: 
> -100.03978
> samples: 2501-3000 reg: 2501-3000 slope: 1.00 dev: 185.6 max: 1251.6 freq: 
> -99.99987
> samples: 3001-3500 reg: 3001-3500 slope: 1.00 dev: 160.1 max: 1181.7 freq: 
> -99.6
> samples: 3501-4000 reg: 3501-4000 slope: 1.00 dev: 150.7 max: 1103.2 freq: 
> -99.0
>
> You can see in this test it takes about 2500 updates to correct the
> initial ntp error and settle down. That's with 1GHz clocksource. In
> some tests I did with smaller clock frequencies or different frequency
> offsets it took much longer than that.

So I started to look into this slow update issue. I was a little
confused, as the logarithmic approximation done in the frequency
correction shouldn't let *that* much initial error accumulate before get
to the +1/0 adjustments.

It ends up this is more a reflection of a different part of your patch.
Particularly the tk->ntp_tick storage. Bascially the ntp_tick variable
is a cache of the ntp_tick_length() value, however it doesn't get set to
ntp_tick_length() until *after* you do the first frequency correction.
Basically this avoids accumulating any error until after the first
correction is made.

My main concern is that this seems like an accounting error. By
basically avoiding accumulating the initial error it seems it would
never be corrected, no?

If I add a similar ntp_tick caching to my current implementation, it
converges practically as fast (with the only initial error coming from
the logarithmic approximation of the divide being quite small). 
Similarly, if you remove the usage of tk->ntp_tick in the error
accumulation loop, using ntp_tick_length() your patch behaves quite
similarly to mine.

Does this align with your view of the code? Or am I misunderstanding
something?

> $ ./tk_test -s 2500 -n 5000
> samples: 1-5000 reg: 2501-5000 slope: 1.00 dev: 135942.4 max: 390761.8 freq: 
> -100.0
>
> Here the regression line is calculated only through the latter half of
> the samples (where it was already settled down) and we can see the
> total ntp error corrected since the first update is around 390
> microseconds.
>
> I'm not saying ntpd or linuxptp can't work with this. The PLL and PI
> controllers seem to handle this slowness in the frequency adjustment
> well. I'm more worried about designs that may change the frequency
> quickly and rely on accurate prediction of the clock in their feedback
> loop.
>
> The other thing I'm concerned about is that the multiplier doesn't
> change only between two closest values when the loop has settled down,
> but in a larger range. With the 1GHz clock I see the multiplier is
> moving in range of 8387767-8387772, which makes the ntp error several
> times larger than it would be if the multiplier switched just between
> 8387769 and 8387770.

So this point is valid. I spent some time reworking things and I'm not
totally happy with it currently (there's a little mess to it), but its
much closer to your implementation logically, but again just avoids the
divide and keeps the accounting closer to what we already have.

I'll hopefully clean up my current work and send it out tomorrow.

Also I do really appreciate your submissions here and apologize I've not
been able to put more time towards it recently. I also know having me
rewrite your patch over and over with various mistakes is probably
frustrating, but I do really want to make sure I both understand all the
subtleties of your changes (which resynthesizing helps) as well as make
sure the accounting is really correct (or at least not changed subtlety
without clear reason).

Thanks so much again!
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to 

Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-04-23 Thread John Stultz
Hey Miroslav!

  Once again, a few months pass and I finally get some more time to look
at this. :( Sorry for how slow this has been going.


On 02/12/2014 07:42 AM, Miroslav Lichvar wrote:
 On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
 Got a few cycles to take another look at this, and tried to address
 Miroslav's latest comments. Please let me know if you have further
 thoughts!
 In the simulations this version of the patch does indeed work better
 than the previous one. I tested it with the ntp_error correction added
 back as per your previous mail. It converges, the measured frequency
 matches the value set by adjtimex and the ntp error stays close to
 zero. I'm concerned about two things now and I'm not sure if they can
 be fixed easily.

 One is that the convergence still seems too slow to me.

 ./tk_test -t 500 -n 4000
 samples: 1-500 reg: 1-500 slope: 1.00 dev: 1947.5 max: 36940.9 freq: 
 -100.08081
 samples: 501-1000 reg: 501-1000 slope: 1.00 dev: 430.7 max: 1669.6 freq: 
 -100.07168
 samples: 1001-1500 reg: 1001-1500 slope: 1.00 dev: 675.3 max: 3172.9 freq: 
 -100.07393
 samples: 1501-2000 reg: 1501-2000 slope: 1.00 dev: 453.9 max: 2223.4 freq: 
 -100.07177
 samples: 2001-2500 reg: 2001-2500 slope: 1.00 dev: 2601.2 max: 10875.4 freq: 
 -100.03978
 samples: 2501-3000 reg: 2501-3000 slope: 1.00 dev: 185.6 max: 1251.6 freq: 
 -99.99987
 samples: 3001-3500 reg: 3001-3500 slope: 1.00 dev: 160.1 max: 1181.7 freq: 
 -99.6
 samples: 3501-4000 reg: 3501-4000 slope: 1.00 dev: 150.7 max: 1103.2 freq: 
 -99.0

 You can see in this test it takes about 2500 updates to correct the
 initial ntp error and settle down. That's with 1GHz clocksource. In
 some tests I did with smaller clock frequencies or different frequency
 offsets it took much longer than that.

So I started to look into this slow update issue. I was a little
confused, as the logarithmic approximation done in the frequency
correction shouldn't let *that* much initial error accumulate before get
to the +1/0 adjustments.

It ends up this is more a reflection of a different part of your patch.
Particularly the tk-ntp_tick storage. Bascially the ntp_tick variable
is a cache of the ntp_tick_length() value, however it doesn't get set to
ntp_tick_length() until *after* you do the first frequency correction.
Basically this avoids accumulating any error until after the first
correction is made.

My main concern is that this seems like an accounting error. By
basically avoiding accumulating the initial error it seems it would
never be corrected, no?

If I add a similar ntp_tick caching to my current implementation, it
converges practically as fast (with the only initial error coming from
the logarithmic approximation of the divide being quite small). 
Similarly, if you remove the usage of tk-ntp_tick in the error
accumulation loop, using ntp_tick_length() your patch behaves quite
similarly to mine.

Does this align with your view of the code? Or am I misunderstanding
something?

 $ ./tk_test -s 2500 -n 5000
 samples: 1-5000 reg: 2501-5000 slope: 1.00 dev: 135942.4 max: 390761.8 freq: 
 -100.0

 Here the regression line is calculated only through the latter half of
 the samples (where it was already settled down) and we can see the
 total ntp error corrected since the first update is around 390
 microseconds.

 I'm not saying ntpd or linuxptp can't work with this. The PLL and PI
 controllers seem to handle this slowness in the frequency adjustment
 well. I'm more worried about designs that may change the frequency
 quickly and rely on accurate prediction of the clock in their feedback
 loop.

 The other thing I'm concerned about is that the multiplier doesn't
 change only between two closest values when the loop has settled down,
 but in a larger range. With the 1GHz clock I see the multiplier is
 moving in range of 8387767-8387772, which makes the ntp error several
 times larger than it would be if the multiplier switched just between
 8387769 and 8387770.

So this point is valid. I spent some time reworking things and I'm not
totally happy with it currently (there's a little mess to it), but its
much closer to your implementation logically, but again just avoids the
divide and keeps the accounting closer to what we already have.

I'll hopefully clean up my current work and send it out tomorrow.

Also I do really appreciate your submissions here and apologize I've not
been able to put more time towards it recently. I also know having me
rewrite your patch over and over with various mistakes is probably
frustrating, but I do really want to make sure I both understand all the
subtleties of your changes (which resynthesizing helps) as well as make
sure the accounting is really correct (or at least not changed subtlety
without clear reason).

Thanks so much again!
-john
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-02-12 Thread Miroslav Lichvar
On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
> Got a few cycles to take another look at this, and tried to address
> Miroslav's latest comments. Please let me know if you have further
> thoughts!

In the simulations this version of the patch does indeed work better
than the previous one. I tested it with the ntp_error correction added
back as per your previous mail. It converges, the measured frequency
matches the value set by adjtimex and the ntp error stays close to
zero. I'm concerned about two things now and I'm not sure if they can
be fixed easily.

One is that the convergence still seems too slow to me.

./tk_test -t 500 -n 4000
samples: 1-500 reg: 1-500 slope: 1.00 dev: 1947.5 max: 36940.9 freq: -100.08081
samples: 501-1000 reg: 501-1000 slope: 1.00 dev: 430.7 max: 1669.6 freq: 
-100.07168
samples: 1001-1500 reg: 1001-1500 slope: 1.00 dev: 675.3 max: 3172.9 freq: 
-100.07393
samples: 1501-2000 reg: 1501-2000 slope: 1.00 dev: 453.9 max: 2223.4 freq: 
-100.07177
samples: 2001-2500 reg: 2001-2500 slope: 1.00 dev: 2601.2 max: 10875.4 freq: 
-100.03978
samples: 2501-3000 reg: 2501-3000 slope: 1.00 dev: 185.6 max: 1251.6 freq: 
-99.99987
samples: 3001-3500 reg: 3001-3500 slope: 1.00 dev: 160.1 max: 1181.7 freq: 
-99.6
samples: 3501-4000 reg: 3501-4000 slope: 1.00 dev: 150.7 max: 1103.2 freq: 
-99.0

You can see in this test it takes about 2500 updates to correct the
initial ntp error and settle down. That's with 1GHz clocksource. In
some tests I did with smaller clock frequencies or different frequency
offsets it took much longer than that.

$ ./tk_test -s 2500 -n 5000
samples: 1-5000 reg: 2501-5000 slope: 1.00 dev: 135942.4 max: 390761.8 freq: 
-100.0

Here the regression line is calculated only through the latter half of
the samples (where it was already settled down) and we can see the
total ntp error corrected since the first update is around 390
microseconds.

I'm not saying ntpd or linuxptp can't work with this. The PLL and PI
controllers seem to handle this slowness in the frequency adjustment
well. I'm more worried about designs that may change the frequency
quickly and rely on accurate prediction of the clock in their feedback
loop.

The other thing I'm concerned about is that the multiplier doesn't
change only between two closest values when the loop has settled down,
but in a larger range. With the 1GHz clock I see the multiplier is
moving in range of 8387767-8387772, which makes the ntp error several
times larger than it would be if the multiplier switched just between
8387769 and 8387770.

Let's make a comparison between the current kernel (A), your patch
(B), and my patch (C). The script used for these tests is in the
tktest git as test1.sh. The first two columns are errors in the
measured frequency offsets (in ppm) from the first 10 and 100 samples. The
other two columns are time error deviation and maximum (in
nanoseconds) when the loop has converged. Both nohz and nohz off are
tested. The tests are repeated with 100 slightly different frequencies
and the mean value or stddev is presented.

freq10   freq100  dev  max
A, nohz on  38.38368 2.72579  1468940.99846788.2
B, nohz on  1.37940  0.15085  298.51312.5  
C, nohz on  0.00601  0.00028  74.0 279.4
A, nohz off 3.89181  0.10436  0.2  0.6
B, nohz off 1.29396  0.14372  0.2  0.6
C, nohz off 0.05867  0.00204  0.2  0.6

In these tests A with nohz is really bad. C is much better than B in
the frequency control with both nohz enabled and disabled. As for the
time error, with nohz disabled all perform similarly, with nohz
enabled C is about 4 times better than B.

I've attached the latest version of my patch. Some bugfixes and
optimizations were made, but that division is still used in the code.
I've noticed there is a division in logarithmic_accumulation(), which
is used as a workaround for an older compiler bug. Perhaps it could be
removed now, so there is more space for this one?

Please reconsider.

Thanks,

-- 
Miroslav Lichvar
>From 150e7c8ed96d078d025f154c219c0a4367a401fc Mon Sep 17 00:00:00 2001
From: Miroslav Lichvar 
Date: Thu, 14 Nov 2013 11:06:02 +0100
Subject: [PATCHv2] timekeeping: Fix clock stability with nohz

Since commit 5eb6d205, NTP corrections are accumulated independently
from the system clock and the clock multiplier is adjusted in a feedback
loop according to the current error between the system time and NTP
time.

This works well when the multiplier is updated often and regularly. With
nohz and idle system, however, the update interval is so long that the
loop becomes unstable. The frequency of the clock doesn't settle down
and there is a large time error, which seems to grow quadratically with
update interval.

If the constants used in the loop were modified for a maximum update
interval, it would be stable, but too slow to keep up with NTP. Without

Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-02-12 Thread Miroslav Lichvar
On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
 Got a few cycles to take another look at this, and tried to address
 Miroslav's latest comments. Please let me know if you have further
 thoughts!

In the simulations this version of the patch does indeed work better
than the previous one. I tested it with the ntp_error correction added
back as per your previous mail. It converges, the measured frequency
matches the value set by adjtimex and the ntp error stays close to
zero. I'm concerned about two things now and I'm not sure if they can
be fixed easily.

One is that the convergence still seems too slow to me.

./tk_test -t 500 -n 4000
samples: 1-500 reg: 1-500 slope: 1.00 dev: 1947.5 max: 36940.9 freq: -100.08081
samples: 501-1000 reg: 501-1000 slope: 1.00 dev: 430.7 max: 1669.6 freq: 
-100.07168
samples: 1001-1500 reg: 1001-1500 slope: 1.00 dev: 675.3 max: 3172.9 freq: 
-100.07393
samples: 1501-2000 reg: 1501-2000 slope: 1.00 dev: 453.9 max: 2223.4 freq: 
-100.07177
samples: 2001-2500 reg: 2001-2500 slope: 1.00 dev: 2601.2 max: 10875.4 freq: 
-100.03978
samples: 2501-3000 reg: 2501-3000 slope: 1.00 dev: 185.6 max: 1251.6 freq: 
-99.99987
samples: 3001-3500 reg: 3001-3500 slope: 1.00 dev: 160.1 max: 1181.7 freq: 
-99.6
samples: 3501-4000 reg: 3501-4000 slope: 1.00 dev: 150.7 max: 1103.2 freq: 
-99.0

You can see in this test it takes about 2500 updates to correct the
initial ntp error and settle down. That's with 1GHz clocksource. In
some tests I did with smaller clock frequencies or different frequency
offsets it took much longer than that.

$ ./tk_test -s 2500 -n 5000
samples: 1-5000 reg: 2501-5000 slope: 1.00 dev: 135942.4 max: 390761.8 freq: 
-100.0

Here the regression line is calculated only through the latter half of
the samples (where it was already settled down) and we can see the
total ntp error corrected since the first update is around 390
microseconds.

I'm not saying ntpd or linuxptp can't work with this. The PLL and PI
controllers seem to handle this slowness in the frequency adjustment
well. I'm more worried about designs that may change the frequency
quickly and rely on accurate prediction of the clock in their feedback
loop.

The other thing I'm concerned about is that the multiplier doesn't
change only between two closest values when the loop has settled down,
but in a larger range. With the 1GHz clock I see the multiplier is
moving in range of 8387767-8387772, which makes the ntp error several
times larger than it would be if the multiplier switched just between
8387769 and 8387770.

Let's make a comparison between the current kernel (A), your patch
(B), and my patch (C). The script used for these tests is in the
tktest git as test1.sh. The first two columns are errors in the
measured frequency offsets (in ppm) from the first 10 and 100 samples. The
other two columns are time error deviation and maximum (in
nanoseconds) when the loop has converged. Both nohz and nohz off are
tested. The tests are repeated with 100 slightly different frequencies
and the mean value or stddev is presented.

freq10   freq100  dev  max
A, nohz on  38.38368 2.72579  1468940.99846788.2
B, nohz on  1.37940  0.15085  298.51312.5  
C, nohz on  0.00601  0.00028  74.0 279.4
A, nohz off 3.89181  0.10436  0.2  0.6
B, nohz off 1.29396  0.14372  0.2  0.6
C, nohz off 0.05867  0.00204  0.2  0.6

In these tests A with nohz is really bad. C is much better than B in
the frequency control with both nohz enabled and disabled. As for the
time error, with nohz disabled all perform similarly, with nohz
enabled C is about 4 times better than B.

I've attached the latest version of my patch. Some bugfixes and
optimizations were made, but that division is still used in the code.
I've noticed there is a division in logarithmic_accumulation(), which
is used as a workaround for an older compiler bug. Perhaps it could be
removed now, so there is more space for this one?

Please reconsider.

Thanks,

-- 
Miroslav Lichvar
From 150e7c8ed96d078d025f154c219c0a4367a401fc Mon Sep 17 00:00:00 2001
From: Miroslav Lichvar mlich...@redhat.com
Date: Thu, 14 Nov 2013 11:06:02 +0100
Subject: [PATCHv2] timekeeping: Fix clock stability with nohz

Since commit 5eb6d205, NTP corrections are accumulated independently
from the system clock and the clock multiplier is adjusted in a feedback
loop according to the current error between the system time and NTP
time.

This works well when the multiplier is updated often and regularly. With
nohz and idle system, however, the update interval is so long that the
loop becomes unstable. The frequency of the clock doesn't settle down
and there is a large time error, which seems to grow quadratically with
update interval.

If the constants used in the loop were modified for a maximum update
interval, it would be stable, but too slow to keep up 

Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-02-07 Thread John Stultz
On 02/07/2014 03:45 AM, Miroslav Lichvar wrote:
> On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
>> Got a few cycles to take another look at this, and tried to address
>> Miroslav's latest comments. Please let me know if you have further
>> thoughts!
> I've had finally some time to look at this, sorry for the delay.
>
>> I also dropped the tk->ntp_error adjustments, as I've never quite
>> been able to recall how that was correct, and it doesn't seem to 
>> have any affect in the simulator. Will have to proceed carefully
>> with testing here.
> I see some effect of the ntp_error correction in the simulator, but it
> seems rather disruptive than helpful. Perhaps the correction was meant
> to account the fact that for the ntp_error accumulation ntp_tick
> should change at the time when mult is changed instead of start of the
> tick. I tried to find that correction and came up with this:
Yes, so I actually re-added the logic a few days after sending that
patch out.

I realized the issue is that for the accumulation point, we're adjusting
the time forward or backwards, so with the new freq the non-accumulated
offset lines up with the current time. Thus the ntp_error is the error
at that accumulation point, which also needs to be adjusted
appropriately (apologies its much easier to see when drawn).

> (ntp_tick1 - ntp_tick2) * offset / interval + offset
>
> That doesn't look usable. But I don't think it's really necessary to
> have any correction for that and I'm for dropping that line from the
> code as your patch does.

I'll have to try to look over your suggestion here another day. I
unfortunately have a head cold at the moment, so any complicated math is
a bit treacherous. :)


> Anyway, it seems there is a different problem in the code. I modified
> the simulator to see how the code works when the clocksource frequency
> is not divisible by HZ. With TSC_FREQ set to 3579545 (acpi_pm freq)
> ntp_error doesn't settle down and the clock has a large frequency
> error.
>
> On top of your patch, this fixes the problem for me:
>
> --- a/kernel/time/timekeeping.c  
> +++ b/kernel/time/timekeeping.c
> @@ -1065,7 +1065,7 @@ static __always_inline void 
> timekeeping_freqadjust(struct timekeeper *tk,  
>   
> /* Calculate current error per tick */
> tick_error = ntp_tick_length() >> tk->ntp_error_shift;
> -   tick_error -= tk->xtime_interval;
> +   tick_error -= tk->xtime_interval + tk->xtime_remainder;
>
> /* Don't worry about correcting it if its small */
> if (likely(abs(tick_error) < 2*interval))
>
> My patch has this problem too. The original code seems to be affected
> to some extent, it's able to converge to the correct frequency, but
> there is some residual ntp error. Adding xtime_remainder to
> timekeeping_bigadjust() fixes that.
>
> I've some comments on the performance of the proposed code, I'll send
> them in a separate mail later.

Ok.. I'll look into this as well.  Thanks so much for your review and fixes!

thanks!
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-02-07 Thread Miroslav Lichvar
On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
> Got a few cycles to take another look at this, and tried to address
> Miroslav's latest comments. Please let me know if you have further
> thoughts!

I've had finally some time to look at this, sorry for the delay.

> I also dropped the tk->ntp_error adjustments, as I've never quite
> been able to recall how that was correct, and it doesn't seem to 
> have any affect in the simulator. Will have to proceed carefully
> with testing here.

I see some effect of the ntp_error correction in the simulator, but it
seems rather disruptive than helpful. Perhaps the correction was meant
to account the fact that for the ntp_error accumulation ntp_tick
should change at the time when mult is changed instead of start of the
tick. I tried to find that correction and came up with this:

(ntp_tick1 - ntp_tick2) * offset / interval + offset

That doesn't look usable. But I don't think it's really necessary to
have any correction for that and I'm for dropping that line from the
code as your patch does.

Anyway, it seems there is a different problem in the code. I modified
the simulator to see how the code works when the clocksource frequency
is not divisible by HZ. With TSC_FREQ set to 3579545 (acpi_pm freq)
ntp_error doesn't settle down and the clock has a large frequency
error.

On top of your patch, this fixes the problem for me:

--- a/kernel/time/timekeeping.c  
+++ b/kernel/time/timekeeping.c
@@ -1065,7 +1065,7 @@ static __always_inline void timekeeping_freqadjust(struct 
timekeeper *tk,  
  
/* Calculate current error per tick */
tick_error = ntp_tick_length() >> tk->ntp_error_shift;
-   tick_error -= tk->xtime_interval;
+   tick_error -= tk->xtime_interval + tk->xtime_remainder;

/* Don't worry about correcting it if its small */
if (likely(abs(tick_error) < 2*interval))

My patch has this problem too. The original code seems to be affected
to some extent, it's able to converge to the correct frequency, but
there is some residual ntp error. Adding xtime_remainder to
timekeeping_bigadjust() fixes that.

I've some comments on the performance of the proposed code, I'll send
them in a separate mail later.

Thanks,

-- 
Miroslav Lichvar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-02-07 Thread Miroslav Lichvar
On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
 Got a few cycles to take another look at this, and tried to address
 Miroslav's latest comments. Please let me know if you have further
 thoughts!

I've had finally some time to look at this, sorry for the delay.

 I also dropped the tk-ntp_error adjustments, as I've never quite
 been able to recall how that was correct, and it doesn't seem to 
 have any affect in the simulator. Will have to proceed carefully
 with testing here.

I see some effect of the ntp_error correction in the simulator, but it
seems rather disruptive than helpful. Perhaps the correction was meant
to account the fact that for the ntp_error accumulation ntp_tick
should change at the time when mult is changed instead of start of the
tick. I tried to find that correction and came up with this:

(ntp_tick1 - ntp_tick2) * offset / interval + offset

That doesn't look usable. But I don't think it's really necessary to
have any correction for that and I'm for dropping that line from the
code as your patch does.

Anyway, it seems there is a different problem in the code. I modified
the simulator to see how the code works when the clocksource frequency
is not divisible by HZ. With TSC_FREQ set to 3579545 (acpi_pm freq)
ntp_error doesn't settle down and the clock has a large frequency
error.

On top of your patch, this fixes the problem for me:

--- a/kernel/time/timekeeping.c  
+++ b/kernel/time/timekeeping.c
@@ -1065,7 +1065,7 @@ static __always_inline void timekeeping_freqadjust(struct 
timekeeper *tk,  
  
/* Calculate current error per tick */
tick_error = ntp_tick_length()  tk-ntp_error_shift;
-   tick_error -= tk-xtime_interval;
+   tick_error -= tk-xtime_interval + tk-xtime_remainder;

/* Don't worry about correcting it if its small */
if (likely(abs(tick_error)  2*interval))

My patch has this problem too. The original code seems to be affected
to some extent, it's able to converge to the correct frequency, but
there is some residual ntp error. Adding xtime_remainder to
timekeeping_bigadjust() fixes that.

I've some comments on the performance of the proposed code, I'll send
them in a separate mail later.

Thanks,

-- 
Miroslav Lichvar
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-02-07 Thread John Stultz
On 02/07/2014 03:45 AM, Miroslav Lichvar wrote:
 On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
 Got a few cycles to take another look at this, and tried to address
 Miroslav's latest comments. Please let me know if you have further
 thoughts!
 I've had finally some time to look at this, sorry for the delay.

 I also dropped the tk-ntp_error adjustments, as I've never quite
 been able to recall how that was correct, and it doesn't seem to 
 have any affect in the simulator. Will have to proceed carefully
 with testing here.
 I see some effect of the ntp_error correction in the simulator, but it
 seems rather disruptive than helpful. Perhaps the correction was meant
 to account the fact that for the ntp_error accumulation ntp_tick
 should change at the time when mult is changed instead of start of the
 tick. I tried to find that correction and came up with this:
Yes, so I actually re-added the logic a few days after sending that
patch out.

I realized the issue is that for the accumulation point, we're adjusting
the time forward or backwards, so with the new freq the non-accumulated
offset lines up with the current time. Thus the ntp_error is the error
at that accumulation point, which also needs to be adjusted
appropriately (apologies its much easier to see when drawn).

 (ntp_tick1 - ntp_tick2) * offset / interval + offset

 That doesn't look usable. But I don't think it's really necessary to
 have any correction for that and I'm for dropping that line from the
 code as your patch does.

I'll have to try to look over your suggestion here another day. I
unfortunately have a head cold at the moment, so any complicated math is
a bit treacherous. :)


 Anyway, it seems there is a different problem in the code. I modified
 the simulator to see how the code works when the clocksource frequency
 is not divisible by HZ. With TSC_FREQ set to 3579545 (acpi_pm freq)
 ntp_error doesn't settle down and the clock has a large frequency
 error.

 On top of your patch, this fixes the problem for me:

 --- a/kernel/time/timekeeping.c  
 +++ b/kernel/time/timekeeping.c
 @@ -1065,7 +1065,7 @@ static __always_inline void 
 timekeeping_freqadjust(struct timekeeper *tk,  
   
 /* Calculate current error per tick */
 tick_error = ntp_tick_length()  tk-ntp_error_shift;
 -   tick_error -= tk-xtime_interval;
 +   tick_error -= tk-xtime_interval + tk-xtime_remainder;

 /* Don't worry about correcting it if its small */
 if (likely(abs(tick_error)  2*interval))

 My patch has this problem too. The original code seems to be affected
 to some extent, it's able to converge to the correct frequency, but
 there is some residual ntp error. Adding xtime_remainder to
 timekeeping_bigadjust() fixes that.

 I've some comments on the performance of the proposed code, I'll send
 them in a separate mail later.

Ok.. I'll look into this as well.  Thanks so much for your review and fixes!

thanks!
-john

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-01-29 Thread John Stultz
On Tue, Jan 28, 2014 at 9:58 AM, Richard Cochran
 wrote:
> I tested for a regression using the patched kernel with the nohz=off
> command line option...
>
> On Mon, Jan 13, 2014 at 10:15:18AM -0800, John Stultz wrote:
>> On 01/13/2014 09:51 AM, Richard Cochran wrote:
>> >
>> > - Linux 3.12.7-nohz-plain-20140106nohz-plain.log
>> > - Linux 3.12.7-nohz-plain-20140106 NOHZ=OFF   periodic-plain.log
>> > - Linux 3.12.7-nohz-fix-20140106-1-gd753140   nohz-fix.log
>
> Regression check:
>
>  - Linux homeboy 3.12.7-nohz-fix-20140106-1-gd753140 NOHZ=OFF
>nohz-fix-periodic.log
>
>> > The performance in the log files as reflected in the clock offset is
>> > summarized in this table. The values are in nanoseconds.
>> >
>> >  | | periodic-plain |  nohz-fix |nohz-plain |
>> >  |-++---+---|
>> >  | minimum |  -1.599000e+03 | -1.051000e+03 | -5.373700e+04 |
>> >  | maximum |  +1.311000e+03 | +1.048000e+03 | +6.389500e+04 |
>> >  | mean|  +9.880240e-02 | -7.747305e+01 | +1.597904e+01 |
>> >  | stddev  |  +4.610021e+02 | +3.960978e+02 | +1.491263e+04 |
>
> Comparing the nohz=off case with and without the patch, the three hour
> test looks like this.
>
>   | | periodic-plain | nohz-fix-periodic |
>   |-++---+
>   | minimum |  -1.599000e+03 | -1.427000e+03 |
>   | maximum |  +1.311000e+03 | +1.279000e+03 |
>   | mean|  +9.880240e-02 | -2.710778e+01 |
>   | stddev  |  +4.610021e+02 | +3.974372e+02 |
>
>> >http://linuxptp.sourceforge.net/nohz-fix/current_nohz.png
>> >http://linuxptp.sourceforge.net/nohz-fix/periodic_vs_fix.png
>
> I also made a third graph showing before and after the patch.
>
> http://linuxptp.sourceforge.net/nohz-fix/nohz_regression.png
>
>> If you do get a chance to look again, I'd also be interested if running
>> with nohz=off w/ the fix doesn't show any regression compared to the
>> unmodified nohz=off case.
>
> Looks like there is no regression.

That's great to hear! Looks like we might be able to queue it for 3.15...

Thanks so much again for the detailed testing! I really appreciate it!

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-01-29 Thread John Stultz
On Tue, Jan 28, 2014 at 9:58 AM, Richard Cochran
richardcoch...@gmail.com wrote:
 I tested for a regression using the patched kernel with the nohz=off
 command line option...

 On Mon, Jan 13, 2014 at 10:15:18AM -0800, John Stultz wrote:
 On 01/13/2014 09:51 AM, Richard Cochran wrote:
 
  - Linux 3.12.7-nohz-plain-20140106nohz-plain.log
  - Linux 3.12.7-nohz-plain-20140106 NOHZ=OFF   periodic-plain.log
  - Linux 3.12.7-nohz-fix-20140106-1-gd753140   nohz-fix.log

 Regression check:

  - Linux homeboy 3.12.7-nohz-fix-20140106-1-gd753140 NOHZ=OFF
nohz-fix-periodic.log

  The performance in the log files as reflected in the clock offset is
  summarized in this table. The values are in nanoseconds.
 
   | | periodic-plain |  nohz-fix |nohz-plain |
   |-++---+---|
   | minimum |  -1.599000e+03 | -1.051000e+03 | -5.373700e+04 |
   | maximum |  +1.311000e+03 | +1.048000e+03 | +6.389500e+04 |
   | mean|  +9.880240e-02 | -7.747305e+01 | +1.597904e+01 |
   | stddev  |  +4.610021e+02 | +3.960978e+02 | +1.491263e+04 |

 Comparing the nohz=off case with and without the patch, the three hour
 test looks like this.

   | | periodic-plain | nohz-fix-periodic |
   |-++---+
   | minimum |  -1.599000e+03 | -1.427000e+03 |
   | maximum |  +1.311000e+03 | +1.279000e+03 |
   | mean|  +9.880240e-02 | -2.710778e+01 |
   | stddev  |  +4.610021e+02 | +3.974372e+02 |

 http://linuxptp.sourceforge.net/nohz-fix/current_nohz.png
 http://linuxptp.sourceforge.net/nohz-fix/periodic_vs_fix.png

 I also made a third graph showing before and after the patch.

 http://linuxptp.sourceforge.net/nohz-fix/nohz_regression.png

 If you do get a chance to look again, I'd also be interested if running
 with nohz=off w/ the fix doesn't show any regression compared to the
 unmodified nohz=off case.

 Looks like there is no regression.

That's great to hear! Looks like we might be able to queue it for 3.15...

Thanks so much again for the detailed testing! I really appreciate it!

thanks
-john
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-01-28 Thread Richard Cochran
I tested for a regression using the patched kernel with the nohz=off
command line option...

On Mon, Jan 13, 2014 at 10:15:18AM -0800, John Stultz wrote:
> On 01/13/2014 09:51 AM, Richard Cochran wrote:
> >
> > - Linux 3.12.7-nohz-plain-20140106nohz-plain.log
> > - Linux 3.12.7-nohz-plain-20140106 NOHZ=OFF   periodic-plain.log
> > - Linux 3.12.7-nohz-fix-20140106-1-gd753140   nohz-fix.log

Regression check:

 - Linux homeboy 3.12.7-nohz-fix-20140106-1-gd753140 NOHZ=OFF
   nohz-fix-periodic.log

> > The performance in the log files as reflected in the clock offset is
> > summarized in this table. The values are in nanoseconds.
> >
> >  | | periodic-plain |  nohz-fix |nohz-plain |
> >  |-++---+---|
> >  | minimum |  -1.599000e+03 | -1.051000e+03 | -5.373700e+04 |
> >  | maximum |  +1.311000e+03 | +1.048000e+03 | +6.389500e+04 |
> >  | mean|  +9.880240e-02 | -7.747305e+01 | +1.597904e+01 |
> >  | stddev  |  +4.610021e+02 | +3.960978e+02 | +1.491263e+04 |

Comparing the nohz=off case with and without the patch, the three hour
test looks like this.

  | | periodic-plain | nohz-fix-periodic |
  |-++---+
  | minimum |  -1.599000e+03 | -1.427000e+03 |
  | maximum |  +1.311000e+03 | +1.279000e+03 |
  | mean|  +9.880240e-02 | -2.710778e+01 |
  | stddev  |  +4.610021e+02 | +3.974372e+02 |

> >http://linuxptp.sourceforge.net/nohz-fix/current_nohz.png
> >http://linuxptp.sourceforge.net/nohz-fix/periodic_vs_fix.png

I also made a third graph showing before and after the patch.

http://linuxptp.sourceforge.net/nohz-fix/nohz_regression.png

> If you do get a chance to look again, I'd also be interested if running
> with nohz=off w/ the fix doesn't show any regression compared to the
> unmodified nohz=off case.

Looks like there is no regression.

Thanks,
Richard


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-01-28 Thread Richard Cochran
I tested for a regression using the patched kernel with the nohz=off
command line option...

On Mon, Jan 13, 2014 at 10:15:18AM -0800, John Stultz wrote:
 On 01/13/2014 09:51 AM, Richard Cochran wrote:
 
  - Linux 3.12.7-nohz-plain-20140106nohz-plain.log
  - Linux 3.12.7-nohz-plain-20140106 NOHZ=OFF   periodic-plain.log
  - Linux 3.12.7-nohz-fix-20140106-1-gd753140   nohz-fix.log

Regression check:

 - Linux homeboy 3.12.7-nohz-fix-20140106-1-gd753140 NOHZ=OFF
   nohz-fix-periodic.log

  The performance in the log files as reflected in the clock offset is
  summarized in this table. The values are in nanoseconds.
 
   | | periodic-plain |  nohz-fix |nohz-plain |
   |-++---+---|
   | minimum |  -1.599000e+03 | -1.051000e+03 | -5.373700e+04 |
   | maximum |  +1.311000e+03 | +1.048000e+03 | +6.389500e+04 |
   | mean|  +9.880240e-02 | -7.747305e+01 | +1.597904e+01 |
   | stddev  |  +4.610021e+02 | +3.960978e+02 | +1.491263e+04 |

Comparing the nohz=off case with and without the patch, the three hour
test looks like this.

  | | periodic-plain | nohz-fix-periodic |
  |-++---+
  | minimum |  -1.599000e+03 | -1.427000e+03 |
  | maximum |  +1.311000e+03 | +1.279000e+03 |
  | mean|  +9.880240e-02 | -2.710778e+01 |
  | stddev  |  +4.610021e+02 | +3.974372e+02 |

 http://linuxptp.sourceforge.net/nohz-fix/current_nohz.png
 http://linuxptp.sourceforge.net/nohz-fix/periodic_vs_fix.png

I also made a third graph showing before and after the patch.

http://linuxptp.sourceforge.net/nohz-fix/nohz_regression.png

 If you do get a chance to look again, I'd also be interested if running
 with nohz=off w/ the fix doesn't show any regression compared to the
 unmodified nohz=off case.

Looks like there is no regression.

Thanks,
Richard


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-01-13 Thread Richard Cochran
On Mon, Jan 13, 2014 at 10:15:18AM -0800, John Stultz wrote:
> That's great to hear! Thanks so much, I really appreciate the testing!
> And this is with HZ=?

HZ=1000
 
> If you do get a chance to look again, I'd also be interested if running
> with nohz=off w/ the fix doesn't show any regression compared to the
> unmodified nohz=off case.

Okay, will do.
 
Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-01-13 Thread John Stultz
On 01/13/2014 09:51 AM, Richard Cochran wrote:
> On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
>> I still think this is probably 3.15 or later material, but I'd be
>> very interested in feedback, thoughts, and testing.
> Over the weekend I did a short test of this new approach. I compiled
> two kernels, one plain v3.12.7 and one with the proposed fix. The
> test was run on three different kernel variants, the current nohz
> kernel, the same with nohz=off, and the same but with John's patch.
>
> I used a simple test script (see below). Using a PCIe express card as
> a PHC, the Intel Corporation 82574L, I simply let the this clock run
> free (not sync'ed to gps or anything), and then synchronized the Linux
> system clock to the PHC using the phc2sys program with a sample rate
> of once every 32 seconds. Each of the tests ran for at least three
> hours on a system without any other load.
>
> - Linux 3.12.7-nohz-plain-20140106nohz-plain.log
> - Linux 3.12.7-nohz-plain-20140106 NOHZ=OFF   periodic-plain.log
> - Linux 3.12.7-nohz-fix-20140106-1-gd753140   nohz-fix.log
>
> The performance in the log files as reflected in the clock offset is
> summarized in this table. The values are in nanoseconds.
>
>  | | periodic-plain |  nohz-fix |nohz-plain |
>  |-++---+---|
>  | minimum |  -1.599000e+03 | -1.051000e+03 | -5.373700e+04 |
>  | maximum |  +1.311000e+03 | +1.048000e+03 | +6.389500e+04 |
>  | mean|  +9.880240e-02 | -7.747305e+01 | +1.597904e+01 |
>  | stddev  |  +4.610021e+02 | +3.960978e+02 | +1.491263e+04 |
>
> I also made two graphs.
>
>http://linuxptp.sourceforge.net/nohz-fix/current_nohz.png
>http://linuxptp.sourceforge.net/nohz-fix/periodic_vs_fix.png
>
> (The log files and scripts are also in that directory.)
>
> So in this test, it looks like the new approach performed at least as
> well as using regular, periodic ticks.

That's great to hear! Thanks so much, I really appreciate the testing!
And this is with HZ=?

If you do get a chance to look again, I'd also be interested if running
with nohz=off w/ the fix doesn't show any regression compared to the
unmodified nohz=off case.

I've been trying to validate the ntpd case to ensure there aren't
regressions when there's more erratic steering, but unfortunately got
side-tracked with what seems to be an ntpd/virtualization issue.

thanks
-john







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-01-13 Thread Richard Cochran
On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
> 
> I still think this is probably 3.15 or later material, but I'd be
> very interested in feedback, thoughts, and testing.

Over the weekend I did a short test of this new approach. I compiled
two kernels, one plain v3.12.7 and one with the proposed fix. The
test was run on three different kernel variants, the current nohz
kernel, the same with nohz=off, and the same but with John's patch.

I used a simple test script (see below). Using a PCIe express card as
a PHC, the Intel Corporation 82574L, I simply let the this clock run
free (not sync'ed to gps or anything), and then synchronized the Linux
system clock to the PHC using the phc2sys program with a sample rate
of once every 32 seconds. Each of the tests ran for at least three
hours on a system without any other load.

- Linux 3.12.7-nohz-plain-20140106nohz-plain.log
- Linux 3.12.7-nohz-plain-20140106 NOHZ=OFF   periodic-plain.log
- Linux 3.12.7-nohz-fix-20140106-1-gd753140   nohz-fix.log

The performance in the log files as reflected in the clock offset is
summarized in this table. The values are in nanoseconds.

 | | periodic-plain |  nohz-fix |nohz-plain |
 |-++---+---|
 | minimum |  -1.599000e+03 | -1.051000e+03 | -5.373700e+04 |
 | maximum |  +1.311000e+03 | +1.048000e+03 | +6.389500e+04 |
 | mean|  +9.880240e-02 | -7.747305e+01 | +1.597904e+01 |
 | stddev  |  +4.610021e+02 | +3.960978e+02 | +1.491263e+04 |

I also made two graphs.

   http://linuxptp.sourceforge.net/nohz-fix/current_nohz.png
   http://linuxptp.sourceforge.net/nohz-fix/periodic_vs_fix.png

(The log files and scripts are also in that directory.)

So in this test, it looks like the new approach performed at least as
well as using regular, periodic ticks.

Thanks,
Richard

---
# nohz-fix-testing.sh
#
# set the ptp clock time from the system time
#
./testptp -s; ./testptp -g; date
#
# wait a bit to let the clocks drift
#
sleep 10
#
# start the servo
#
./phc2sys -s eth4 -m -q -l7 -O0 -P0 -I0 -R.03125
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-01-13 Thread Richard Cochran
On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
 
 I still think this is probably 3.15 or later material, but I'd be
 very interested in feedback, thoughts, and testing.

Over the weekend I did a short test of this new approach. I compiled
two kernels, one plain v3.12.7 and one with the proposed fix. The
test was run on three different kernel variants, the current nohz
kernel, the same with nohz=off, and the same but with John's patch.

I used a simple test script (see below). Using a PCIe express card as
a PHC, the Intel Corporation 82574L, I simply let the this clock run
free (not sync'ed to gps or anything), and then synchronized the Linux
system clock to the PHC using the phc2sys program with a sample rate
of once every 32 seconds. Each of the tests ran for at least three
hours on a system without any other load.

- Linux 3.12.7-nohz-plain-20140106nohz-plain.log
- Linux 3.12.7-nohz-plain-20140106 NOHZ=OFF   periodic-plain.log
- Linux 3.12.7-nohz-fix-20140106-1-gd753140   nohz-fix.log

The performance in the log files as reflected in the clock offset is
summarized in this table. The values are in nanoseconds.

 | | periodic-plain |  nohz-fix |nohz-plain |
 |-++---+---|
 | minimum |  -1.599000e+03 | -1.051000e+03 | -5.373700e+04 |
 | maximum |  +1.311000e+03 | +1.048000e+03 | +6.389500e+04 |
 | mean|  +9.880240e-02 | -7.747305e+01 | +1.597904e+01 |
 | stddev  |  +4.610021e+02 | +3.960978e+02 | +1.491263e+04 |

I also made two graphs.

   http://linuxptp.sourceforge.net/nohz-fix/current_nohz.png
   http://linuxptp.sourceforge.net/nohz-fix/periodic_vs_fix.png

(The log files and scripts are also in that directory.)

So in this test, it looks like the new approach performed at least as
well as using regular, periodic ticks.

Thanks,
Richard

---
# nohz-fix-testing.sh
#
# set the ptp clock time from the system time
#
./testptp -s; ./testptp -g; date
#
# wait a bit to let the clocks drift
#
sleep 10
#
# start the servo
#
./phc2sys -s eth4 -m -q -l7 -O0 -P0 -I0 -R.03125
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-01-13 Thread John Stultz
On 01/13/2014 09:51 AM, Richard Cochran wrote:
 On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
 I still think this is probably 3.15 or later material, but I'd be
 very interested in feedback, thoughts, and testing.
 Over the weekend I did a short test of this new approach. I compiled
 two kernels, one plain v3.12.7 and one with the proposed fix. The
 test was run on three different kernel variants, the current nohz
 kernel, the same with nohz=off, and the same but with John's patch.

 I used a simple test script (see below). Using a PCIe express card as
 a PHC, the Intel Corporation 82574L, I simply let the this clock run
 free (not sync'ed to gps or anything), and then synchronized the Linux
 system clock to the PHC using the phc2sys program with a sample rate
 of once every 32 seconds. Each of the tests ran for at least three
 hours on a system without any other load.

 - Linux 3.12.7-nohz-plain-20140106nohz-plain.log
 - Linux 3.12.7-nohz-plain-20140106 NOHZ=OFF   periodic-plain.log
 - Linux 3.12.7-nohz-fix-20140106-1-gd753140   nohz-fix.log

 The performance in the log files as reflected in the clock offset is
 summarized in this table. The values are in nanoseconds.

  | | periodic-plain |  nohz-fix |nohz-plain |
  |-++---+---|
  | minimum |  -1.599000e+03 | -1.051000e+03 | -5.373700e+04 |
  | maximum |  +1.311000e+03 | +1.048000e+03 | +6.389500e+04 |
  | mean|  +9.880240e-02 | -7.747305e+01 | +1.597904e+01 |
  | stddev  |  +4.610021e+02 | +3.960978e+02 | +1.491263e+04 |

 I also made two graphs.

http://linuxptp.sourceforge.net/nohz-fix/current_nohz.png
http://linuxptp.sourceforge.net/nohz-fix/periodic_vs_fix.png

 (The log files and scripts are also in that directory.)

 So in this test, it looks like the new approach performed at least as
 well as using regular, periodic ticks.

That's great to hear! Thanks so much, I really appreciate the testing!
And this is with HZ=?

If you do get a chance to look again, I'd also be interested if running
with nohz=off w/ the fix doesn't show any regression compared to the
unmodified nohz=off case.

I've been trying to validate the ntpd case to ensure there aren't
regressions when there's more erratic steering, but unfortunately got
side-tracked with what seems to be an ntpd/virtualization issue.

thanks
-john







--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-01-13 Thread Richard Cochran
On Mon, Jan 13, 2014 at 10:15:18AM -0800, John Stultz wrote:
 That's great to hear! Thanks so much, I really appreciate the testing!
 And this is with HZ=?

HZ=1000
 
 If you do get a chance to look again, I'd also be interested if running
 with nohz=off w/ the fix doesn't show any regression compared to the
 unmodified nohz=off case.

Okay, will do.
 
Thanks,
Richard
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-01-06 Thread John Stultz
Got a few cycles to take another look at this, and tried to address
Miroslav's latest comments. Please let me know if you have further
thoughts!

thanks
-john


The existing timekeeping_adjust logic has always been complicated
to understand. Further, since it was developed prior to NOHZ becoming
common, its not surprising it performs poorly when NOHZ is enabled.

Since Miroslav pointed out the problematic nature of the existing code
in the NOHZ case, I've tried to refactor the code to perform better.

The problem with the previous approach was that it tried to adjust
for the total cumulative error using a scaled dampening factor. This
resulted in large errors to be corrected slowly, while small errors
were corrected quickly. With NOHZ the timekeeping code doesn't know
how far out the next tick will be, so this results in bad
over-correction to small errors, and insufficient correction to large
errors.

Inspired by Miroslav's patch, I've refactored the code to try to
address the correction in two steps.

1) Check the future freq error for the next tick, and if the frequency
error is large, try to make sure we correct that error as best we can.

2) Then make a small single unit adjustment to correct any cumulative
error that we've collected.

This method performs fairly well in the simulator Miroslav created.

I also dropped the tk->ntp_error adjustments, as I've never quite
been able to recall how that was correct, and it doesn't seem to 
have any affect in the simulator. Will have to proceed carefully
with testing here.

I still think this is probably 3.15 or later material, but I'd be
very interested in feedback, thoughts, and testing.

Cc: Miroslav Lichvar 
Cc: Richard Cochran 
Cc: Prarit Bhargava 
Reported-by: Miroslav Lichvar 
Signed-off-by: John Stultz 
---
 kernel/time/timekeeping.c | 163 +-
 1 file changed, 61 insertions(+), 102 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 87b4f00..15354d4 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1048,54 +1048,50 @@ static int __init timekeeping_init_ops(void)
 device_initcall(timekeeping_init_ops);
 
 /*
- * If the error is already larger, we look ahead even further
- * to compensate for late or lost adjustments.
+ * Calculate the future error caused by incorrect freq value
+ * and adjust the timekeeper to correct that.
  */
-static __always_inline int timekeeping_bigadjust(struct timekeeper *tk,
-s64 error, s64 *interval,
-s64 *offset)
+static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
+s64 interval,
+s64 offset)
 {
s64 tick_error, i;
-   u32 look_ahead, adj;
-   s32 error2, mult;
+   u32 adj;
+   s32 mult = 1;
 
-   /*
-* Use the current error value to determine how much to look ahead.
-* The larger the error the slower we adjust for it to avoid problems
-* with losing too many ticks, otherwise we would overadjust and
-* produce an even larger error.  The smaller the adjustment the
-* faster we try to adjust for it, as lost ticks can do less harm
-* here.  This is tuned so that an error of about 1 msec is adjusted
-* within about 1 sec (or 2^20 nsec in 2^SHIFT_HZ ticks).
-*/
-   error2 = tk->ntp_error >> (NTP_SCALE_SHIFT + 22 - 2 * SHIFT_HZ);
-   error2 = abs(error2);
-   for (look_ahead = 0; error2 > 0; look_ahead++)
-   error2 >>= 2;
+   /* Calculate current error per tick */
+   tick_error = ntp_tick_length() >> tk->ntp_error_shift;
+   tick_error -= tk->xtime_interval;
 
-   /*
-* Now calculate the error in (1 << look_ahead) ticks, but first
-* remove the single look ahead already included in the error.
-*/
-   tick_error = ntp_tick_length() >> (tk->ntp_error_shift + 1);
-   tick_error -= tk->xtime_interval >> 1;
-   error = ((error - tick_error) >> look_ahead) + tick_error;
+   /* Don't worry about correcting it if its small */
+   if (likely(abs(tick_error) < 2*interval))
+   return;
 
-   /* Finally calculate the adjustment shift value.  */
-   i = *interval;
-   mult = 1;
-   if (error < 0) {
-   error = -error;
-   *interval = -*interval;
-   *offset = -*offset;
-   mult = -1;
+   if (tick_error < 0) {
+   interval = -interval;
+   offset = -offset;
+   mult = -mult;
}
-   for (adj = 0; error > i; adj++)
-   error >>= 1;
 
-   *interval <<= adj;
-   *offset <<= adj;
-   return mult << adj;
+   /* Sort out the magnitude of the correction */
+   tick_error = abs(tick_error);
+   i = 

[PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

2014-01-06 Thread John Stultz
Got a few cycles to take another look at this, and tried to address
Miroslav's latest comments. Please let me know if you have further
thoughts!

thanks
-john


The existing timekeeping_adjust logic has always been complicated
to understand. Further, since it was developed prior to NOHZ becoming
common, its not surprising it performs poorly when NOHZ is enabled.

Since Miroslav pointed out the problematic nature of the existing code
in the NOHZ case, I've tried to refactor the code to perform better.

The problem with the previous approach was that it tried to adjust
for the total cumulative error using a scaled dampening factor. This
resulted in large errors to be corrected slowly, while small errors
were corrected quickly. With NOHZ the timekeeping code doesn't know
how far out the next tick will be, so this results in bad
over-correction to small errors, and insufficient correction to large
errors.

Inspired by Miroslav's patch, I've refactored the code to try to
address the correction in two steps.

1) Check the future freq error for the next tick, and if the frequency
error is large, try to make sure we correct that error as best we can.

2) Then make a small single unit adjustment to correct any cumulative
error that we've collected.

This method performs fairly well in the simulator Miroslav created.

I also dropped the tk-ntp_error adjustments, as I've never quite
been able to recall how that was correct, and it doesn't seem to 
have any affect in the simulator. Will have to proceed carefully
with testing here.

I still think this is probably 3.15 or later material, but I'd be
very interested in feedback, thoughts, and testing.

Cc: Miroslav Lichvar mlich...@redhat.com
Cc: Richard Cochran richardcoch...@gmail.com
Cc: Prarit Bhargava pra...@redhat.com
Reported-by: Miroslav Lichvar mlich...@redhat.com
Signed-off-by: John Stultz john.stu...@linaro.org
---
 kernel/time/timekeeping.c | 163 +-
 1 file changed, 61 insertions(+), 102 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 87b4f00..15354d4 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1048,54 +1048,50 @@ static int __init timekeeping_init_ops(void)
 device_initcall(timekeeping_init_ops);
 
 /*
- * If the error is already larger, we look ahead even further
- * to compensate for late or lost adjustments.
+ * Calculate the future error caused by incorrect freq value
+ * and adjust the timekeeper to correct that.
  */
-static __always_inline int timekeeping_bigadjust(struct timekeeper *tk,
-s64 error, s64 *interval,
-s64 *offset)
+static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
+s64 interval,
+s64 offset)
 {
s64 tick_error, i;
-   u32 look_ahead, adj;
-   s32 error2, mult;
+   u32 adj;
+   s32 mult = 1;
 
-   /*
-* Use the current error value to determine how much to look ahead.
-* The larger the error the slower we adjust for it to avoid problems
-* with losing too many ticks, otherwise we would overadjust and
-* produce an even larger error.  The smaller the adjustment the
-* faster we try to adjust for it, as lost ticks can do less harm
-* here.  This is tuned so that an error of about 1 msec is adjusted
-* within about 1 sec (or 2^20 nsec in 2^SHIFT_HZ ticks).
-*/
-   error2 = tk-ntp_error  (NTP_SCALE_SHIFT + 22 - 2 * SHIFT_HZ);
-   error2 = abs(error2);
-   for (look_ahead = 0; error2  0; look_ahead++)
-   error2 = 2;
+   /* Calculate current error per tick */
+   tick_error = ntp_tick_length()  tk-ntp_error_shift;
+   tick_error -= tk-xtime_interval;
 
-   /*
-* Now calculate the error in (1  look_ahead) ticks, but first
-* remove the single look ahead already included in the error.
-*/
-   tick_error = ntp_tick_length()  (tk-ntp_error_shift + 1);
-   tick_error -= tk-xtime_interval  1;
-   error = ((error - tick_error)  look_ahead) + tick_error;
+   /* Don't worry about correcting it if its small */
+   if (likely(abs(tick_error)  2*interval))
+   return;
 
-   /* Finally calculate the adjustment shift value.  */
-   i = *interval;
-   mult = 1;
-   if (error  0) {
-   error = -error;
-   *interval = -*interval;
-   *offset = -*offset;
-   mult = -1;
+   if (tick_error  0) {
+   interval = -interval;
+   offset = -offset;
+   mult = -mult;
}
-   for (adj = 0; error  i; adj++)
-   error = 1;
 
-   *interval = adj;
-   *offset = adj;
-   return mult  adj;
+   /* Sort out the magnitude of the