Re: defects for uses of abs(u64) (was: Re: Regression: can't apply frequency offsets above 1000ppm)

2015-09-23 Thread Neil Brown
Joe Perches  writes:

> On Fri, 2015-09-04 at 18:00 -0700, John Stultz wrote:
>> On Fri, Sep 4, 2015 at 5:57 PM, John Stultz  wrote:
>> > On Thu, Sep 3, 2015 at 4:26 AM, Miroslav Lichvar  
>> > wrote:
>> >> On Wed, Sep 02, 2015 at 04:16:00PM -0700, John Stultz wrote:
>> >>> On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves  wrote:
>> >>> > And just installing chrony from the feeds. With any kernel from 3.17
>> >>> > you'll have wrong estimates at chronyc sourcestats.
>> >>>
>> >>> Wrong estimates? Could you be more specific about what the failure
>> >>> you're seeing is here? The
>> >>>
>> >>> I installed the image above, which comes with a 4.1.6 kernel, and
>> >>> chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
>> >>> internet fairly quickly (at least according to chronyc tracking).
>> >>
>> >> To see the bug with chronyd the initial offset shouldn't be very close
>> >> to zero, so it's forced to correct the offset by adjusting the
>> >> frequency in a larger step.
>> >>
>> >> I'm attaching a simple C program that prints the frequency offset
>> >> as measured between the REALTIME and MONOTONIC_RAW clocks when the
>> >> adjtimex tick is set to 9000. It should show values close to -10
>> >> ppm and I suspect on the BBB it will be much smaller.
>> >
>> > So I spent some time on this late last night and this afternoon.
>> >
>> > It was a little odd because things don't seem totally broken, but
>> > something isn't quite right.
>> >
>> > Digging around it seems the iterative logrithmic approximation done in
>> > timekeeping_freqadjust() wasn't working right. Instead of making
>> > smaller order alternating positive and negative adjustments, it was
>> > doing strange growing adjustments for the same value that wern't large
>> > enough to actually correct things very quickly. This made it much
>> > slower to adapt to specified frequency values.
>> >
>> > The odd bit, is it seems to come down to:
>> > tick_error = abs(tick_error);
>> >
>> > Haven't chased down why yet, but apparently abs() isn't doing what one
>> > would think when passed a s64 value.
>> 
>> Well.. chasing it down wasn't hard.. from include/linux/kernel.h:
>> /*
>>  * abs() handles unsigned and signed longs, ints, shorts and chars.  For all
>>  * input types abs() returns a signed long.
>>  * abs() should not be used for 64-bit types (s64, u64, long long) - use 
>> abs64()
>>  * for those.
>>  */
>> 
>> Ouch.
>
> Here's a little cocci script that finds more of these in:

Thanks.

Maybe we should also:

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 5582410727cb..aa7d69afdcac 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -208,6 +208,7 @@ extern int _cond_resched(void);
  */
 #define abs(x) ({  \
long ret;   \
+   BUILD_BUG_ON(sizeof(x) > sizeof(long)); \
if (sizeof(x) == sizeof(long)) {\
long __x = (x); \
ret = (__x < 0) ? -__x : __x;   \


so that people won't make the same mistake again.
That finds bugs in
 driver/md/raid10.c
 drivers/gpu/drm/radeon/radeon_display.c
 kernel/time/clocksource.c
 kernel/time/timekeeping.c
 fs/ext4/mballoc.c
 
that your cocci scripted missed.  All "abs(x - y)".

As sector_t can be 32bit and can be 64bit, I wonder if abs_sector()
would be a good idea ... probably not.

Thoughts?

NeilBrown


signature.asc
Description: PGP signature


Re: defects for uses of abs(u64) (was: Re: Regression: can't apply frequency offsets above 1000ppm)

2015-09-23 Thread Neil Brown
Joe Perches  writes:

> On Fri, 2015-09-04 at 18:00 -0700, John Stultz wrote:
>> On Fri, Sep 4, 2015 at 5:57 PM, John Stultz  wrote:
>> > On Thu, Sep 3, 2015 at 4:26 AM, Miroslav Lichvar  
>> > wrote:
>> >> On Wed, Sep 02, 2015 at 04:16:00PM -0700, John Stultz wrote:
>> >>> On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves  wrote:
>> >>> > And just installing chrony from the feeds. With any kernel from 3.17
>> >>> > you'll have wrong estimates at chronyc sourcestats.
>> >>>
>> >>> Wrong estimates? Could you be more specific about what the failure
>> >>> you're seeing is here? The
>> >>>
>> >>> I installed the image above, which comes with a 4.1.6 kernel, and
>> >>> chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
>> >>> internet fairly quickly (at least according to chronyc tracking).
>> >>
>> >> To see the bug with chronyd the initial offset shouldn't be very close
>> >> to zero, so it's forced to correct the offset by adjusting the
>> >> frequency in a larger step.
>> >>
>> >> I'm attaching a simple C program that prints the frequency offset
>> >> as measured between the REALTIME and MONOTONIC_RAW clocks when the
>> >> adjtimex tick is set to 9000. It should show values close to -10
>> >> ppm and I suspect on the BBB it will be much smaller.
>> >
>> > So I spent some time on this late last night and this afternoon.
>> >
>> > It was a little odd because things don't seem totally broken, but
>> > something isn't quite right.
>> >
>> > Digging around it seems the iterative logrithmic approximation done in
>> > timekeeping_freqadjust() wasn't working right. Instead of making
>> > smaller order alternating positive and negative adjustments, it was
>> > doing strange growing adjustments for the same value that wern't large
>> > enough to actually correct things very quickly. This made it much
>> > slower to adapt to specified frequency values.
>> >
>> > The odd bit, is it seems to come down to:
>> > tick_error = abs(tick_error);
>> >
>> > Haven't chased down why yet, but apparently abs() isn't doing what one
>> > would think when passed a s64 value.
>> 
>> Well.. chasing it down wasn't hard.. from include/linux/kernel.h:
>> /*
>>  * abs() handles unsigned and signed longs, ints, shorts and chars.  For all
>>  * input types abs() returns a signed long.
>>  * abs() should not be used for 64-bit types (s64, u64, long long) - use 
>> abs64()
>>  * for those.
>>  */
>> 
>> Ouch.
>
> Here's a little cocci script that finds more of these in:

Thanks.

Maybe we should also:

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 5582410727cb..aa7d69afdcac 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -208,6 +208,7 @@ extern int _cond_resched(void);
  */
 #define abs(x) ({  \
long ret;   \
+   BUILD_BUG_ON(sizeof(x) > sizeof(long)); \
if (sizeof(x) == sizeof(long)) {\
long __x = (x); \
ret = (__x < 0) ? -__x : __x;   \


so that people won't make the same mistake again.
That finds bugs in
 driver/md/raid10.c
 drivers/gpu/drm/radeon/radeon_display.c
 kernel/time/clocksource.c
 kernel/time/timekeeping.c
 fs/ext4/mballoc.c
 
that your cocci scripted missed.  All "abs(x - y)".

As sector_t can be 32bit and can be 64bit, I wonder if abs_sector()
would be a good idea ... probably not.

Thoughts?

NeilBrown


signature.asc
Description: PGP signature


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-08 Thread Nuno Gonçalves
Yes please.

Tested-by: Nuno Goncalves 

Thanks,
Nuno

On Wed, Sep 9, 2015 at 1:52 AM, John Stultz  wrote:
> On Sat, Sep 5, 2015 at 6:41 AM, Nuno Gonçalves  wrote:
>> Considering your last message I just tried to use abs64 instead. Fixes
>> the problem for me.
>
> Yea, I've since simplified my patch in the same way.
>
> Still looking good? Can I get a Tested-by: from you?
>
> thanks
> -john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-08 Thread John Stultz
On Sat, Sep 5, 2015 at 6:41 AM, Nuno Gonçalves  wrote:
> Considering your last message I just tried to use abs64 instead. Fixes
> the problem for me.

Yea, I've since simplified my patch in the same way.

Still looking good? Can I get a Tested-by: from you?

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-08 Thread John Stultz
On Sat, Sep 5, 2015 at 6:41 AM, Nuno Gonçalves  wrote:
> Considering your last message I just tried to use abs64 instead. Fixes
> the problem for me.

Yea, I've since simplified my patch in the same way.

Still looking good? Can I get a Tested-by: from you?

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-08 Thread Nuno Gonçalves
Yes please.

Tested-by: Nuno Goncalves 

Thanks,
Nuno

On Wed, Sep 9, 2015 at 1:52 AM, John Stultz  wrote:
> On Sat, Sep 5, 2015 at 6:41 AM, Nuno Gonçalves  wrote:
>> Considering your last message I just tried to use abs64 instead. Fixes
>> the problem for me.
>
> Yea, I've since simplified my patch in the same way.
>
> Still looking good? Can I get a Tested-by: from you?
>
> thanks
> -john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-05 Thread Nuno Gonçalves
Considering your last message I just tried to use abs64 instead. Fixes
the problem for me.

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 946acb7..414d9df 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1615,7 +1615,7 @@ static __always_inline void
timekeeping_freqadjust(struct timekeeper *tk,
  negative = (tick_error < 0);

  /* Sort out the magnitude of the correction */
- tick_error = abs(tick_error);
+ tick_error = abs64(tick_error);
  for (adj = 0; tick_error > interval; adj++)
  tick_error >>= 1;

Thanks!
Nuno

On Sat, Sep 5, 2015 at 2:00 AM, John Stultz  wrote:
> On Fri, Sep 4, 2015 at 5:57 PM, John Stultz  wrote:
>> On Thu, Sep 3, 2015 at 4:26 AM, Miroslav Lichvar  wrote:
>>> On Wed, Sep 02, 2015 at 04:16:00PM -0700, John Stultz wrote:
 On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves  wrote:
 > And just installing chrony from the feeds. With any kernel from 3.17
 > you'll have wrong estimates at chronyc sourcestats.

 Wrong estimates? Could you be more specific about what the failure
 you're seeing is here? The

 I installed the image above, which comes with a 4.1.6 kernel, and
 chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
 internet fairly quickly (at least according to chronyc tracking).
>>>
>>> To see the bug with chronyd the initial offset shouldn't be very close
>>> to zero, so it's forced to correct the offset by adjusting the
>>> frequency in a larger step.
>>>
>>> I'm attaching a simple C program that prints the frequency offset
>>> as measured between the REALTIME and MONOTONIC_RAW clocks when the
>>> adjtimex tick is set to 9000. It should show values close to -10
>>> ppm and I suspect on the BBB it will be much smaller.
>>
>> So I spent some time on this late last night and this afternoon.
>>
>> It was a little odd because things don't seem totally broken, but
>> something isn't quite right.
>>
>> Digging around it seems the iterative logrithmic approximation done in
>> timekeeping_freqadjust() wasn't working right. Instead of making
>> smaller order alternating positive and negative adjustments, it was
>> doing strange growing adjustments for the same value that wern't large
>> enough to actually correct things very quickly. This made it much
>> slower to adapt to specified frequency values.
>>
>> The odd bit, is it seems to come down to:
>> tick_error = abs(tick_error);
>>
>> Haven't chased down why yet, but apparently abs() isn't doing what one
>> would think when passed a s64 value.
>
> Well.. chasing it down wasn't hard.. from include/linux/kernel.h:
> /*
>  * abs() handles unsigned and signed longs, ints, shorts and chars.  For all
>  * input types abs() returns a signed long.
>  * abs() should not be used for 64-bit types (s64, u64, long long) - use 
> abs64()
>  * for those.
>  */
>
> Ouch.
> -john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-05 Thread Nuno Gonçalves
Considering your last message I just tried to use abs64 instead. Fixes
the problem for me.

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 946acb7..414d9df 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1615,7 +1615,7 @@ static __always_inline void
timekeeping_freqadjust(struct timekeeper *tk,
  negative = (tick_error < 0);

  /* Sort out the magnitude of the correction */
- tick_error = abs(tick_error);
+ tick_error = abs64(tick_error);
  for (adj = 0; tick_error > interval; adj++)
  tick_error >>= 1;

Thanks!
Nuno

On Sat, Sep 5, 2015 at 2:00 AM, John Stultz  wrote:
> On Fri, Sep 4, 2015 at 5:57 PM, John Stultz  wrote:
>> On Thu, Sep 3, 2015 at 4:26 AM, Miroslav Lichvar  wrote:
>>> On Wed, Sep 02, 2015 at 04:16:00PM -0700, John Stultz wrote:
 On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves  wrote:
 > And just installing chrony from the feeds. With any kernel from 3.17
 > you'll have wrong estimates at chronyc sourcestats.

 Wrong estimates? Could you be more specific about what the failure
 you're seeing is here? The

 I installed the image above, which comes with a 4.1.6 kernel, and
 chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
 internet fairly quickly (at least according to chronyc tracking).
>>>
>>> To see the bug with chronyd the initial offset shouldn't be very close
>>> to zero, so it's forced to correct the offset by adjusting the
>>> frequency in a larger step.
>>>
>>> I'm attaching a simple C program that prints the frequency offset
>>> as measured between the REALTIME and MONOTONIC_RAW clocks when the
>>> adjtimex tick is set to 9000. It should show values close to -10
>>> ppm and I suspect on the BBB it will be much smaller.
>>
>> So I spent some time on this late last night and this afternoon.
>>
>> It was a little odd because things don't seem totally broken, but
>> something isn't quite right.
>>
>> Digging around it seems the iterative logrithmic approximation done in
>> timekeeping_freqadjust() wasn't working right. Instead of making
>> smaller order alternating positive and negative adjustments, it was
>> doing strange growing adjustments for the same value that wern't large
>> enough to actually correct things very quickly. This made it much
>> slower to adapt to specified frequency values.
>>
>> The odd bit, is it seems to come down to:
>> tick_error = abs(tick_error);
>>
>> Haven't chased down why yet, but apparently abs() isn't doing what one
>> would think when passed a s64 value.
>
> Well.. chasing it down wasn't hard.. from include/linux/kernel.h:
> /*
>  * abs() handles unsigned and signed longs, ints, shorts and chars.  For all
>  * input types abs() returns a signed long.
>  * abs() should not be used for 64-bit types (s64, u64, long long) - use 
> abs64()
>  * for those.
>  */
>
> Ouch.
> -john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


defects for uses of abs(u64) (was: Re: Regression: can't apply frequency offsets above 1000ppm)

2015-09-04 Thread Joe Perches
On Fri, 2015-09-04 at 18:00 -0700, John Stultz wrote:
> On Fri, Sep 4, 2015 at 5:57 PM, John Stultz  wrote:
> > On Thu, Sep 3, 2015 at 4:26 AM, Miroslav Lichvar  
> > wrote:
> >> On Wed, Sep 02, 2015 at 04:16:00PM -0700, John Stultz wrote:
> >>> On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves  wrote:
> >>> > And just installing chrony from the feeds. With any kernel from 3.17
> >>> > you'll have wrong estimates at chronyc sourcestats.
> >>>
> >>> Wrong estimates? Could you be more specific about what the failure
> >>> you're seeing is here? The
> >>>
> >>> I installed the image above, which comes with a 4.1.6 kernel, and
> >>> chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
> >>> internet fairly quickly (at least according to chronyc tracking).
> >>
> >> To see the bug with chronyd the initial offset shouldn't be very close
> >> to zero, so it's forced to correct the offset by adjusting the
> >> frequency in a larger step.
> >>
> >> I'm attaching a simple C program that prints the frequency offset
> >> as measured between the REALTIME and MONOTONIC_RAW clocks when the
> >> adjtimex tick is set to 9000. It should show values close to -10
> >> ppm and I suspect on the BBB it will be much smaller.
> >
> > So I spent some time on this late last night and this afternoon.
> >
> > It was a little odd because things don't seem totally broken, but
> > something isn't quite right.
> >
> > Digging around it seems the iterative logrithmic approximation done in
> > timekeeping_freqadjust() wasn't working right. Instead of making
> > smaller order alternating positive and negative adjustments, it was
> > doing strange growing adjustments for the same value that wern't large
> > enough to actually correct things very quickly. This made it much
> > slower to adapt to specified frequency values.
> >
> > The odd bit, is it seems to come down to:
> > tick_error = abs(tick_error);
> >
> > Haven't chased down why yet, but apparently abs() isn't doing what one
> > would think when passed a s64 value.
> 
> Well.. chasing it down wasn't hard.. from include/linux/kernel.h:
> /*
>  * abs() handles unsigned and signed longs, ints, shorts and chars.  For all
>  * input types abs() returns a signed long.
>  * abs() should not be used for 64-bit types (s64, u64, long long) - use 
> abs64()
>  * for those.
>  */
> 
> Ouch.

Here's a little cocci script that finds more of these in:

lib/percpu_counter.c
drivers/input/joystick/walkera0701.c
drivers/md/raid5.c
drivers/spi/spi-pxa2xx.c
fs/f2fs/debug.c

$ cat abs.cocci
@@
u64 t;
@@

*   abs(t)

@@
s64 t;
@@

*   abs(t)

@@
long long t;
@@

*   abs(t)

@@
unsigned long long t;
@@

*   abs(t)

@@
uint64_t t;
@@

*   abs(t)

@@
int64_t t;
@@

*   abs(t)

$

diff -u -p ./lib/percpu_counter.c /tmp/nothing/lib/percpu_counter.c
--- ./lib/percpu_counter.c
+++ /tmp/nothing/lib/percpu_counter.c
@@ -203,7 +203,6 @@ int __percpu_counter_compare(struct perc
 
count = percpu_counter_read(fbc);
/* Check to see if rough count will be sufficient for comparison */
-   if (abs(count - rhs) > (batch * num_online_cpus())) {
if (count > rhs)
return 1;
else

diff -u -p ./drivers/input/joystick/walkera0701.c 
/tmp/nothing/drivers/input/joystick/walkera0701.c
--- ./drivers/input/joystick/walkera0701.c
+++ /tmp/nothing/drivers/input/joystick/walkera0701.c
@@ -150,7 +150,6 @@ static void walkera0701_irq_handler(void
if (w->counter == 24) { /* full frame */
walkera0701_parse_frame(w);
w->counter = NO_SYNC;
-   if (abs(pulse_time - SYNC_PULSE) < RESERVE) /* new 
frame sync */
w->counter = 0;
} else {
if ((pulse_time > (ANALOG_MIN_PULSE - RESERVE)
@@ -161,7 +160,6 @@ static void walkera0701_irq_handler(void
} else
w->counter = NO_SYNC;
}
-   } else if (abs(pulse_time - SYNC_PULSE - BIN0_PULSE) <
RESERVE + BIN1_PULSE - BIN0_PULSE)  /* 
frame sync .. */
w->counter = 0;

diff -u -p ./drivers/md/raid5.c /tmp/nothing/drivers/md/raid5.c
--- ./drivers/md/raid5.c
+++ /tmp/nothing/drivers/md/raid5.c
@@ -6701,8 +6701,6 @@ static int run(struct mddev *mddev)
 * readonly mode so it can take control before
 * allowing any writes.  So just check for that.
 */
-   if (abs(min_offset_diff) >= mddev->chunk_sectors &&
-   abs(min_offset_diff) >= mddev->new_chunk_sectors)
/* not really in-place - so OK */;
else if (mddev->ro == 0) {
printk(KERN_ERR "md/raid:%s: in-place reshape "

diff -u -p ./drivers/spi/spi-pxa2xx.c 

Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-04 Thread John Stultz
On Fri, Sep 4, 2015 at 5:57 PM, John Stultz  wrote:
> On Thu, Sep 3, 2015 at 4:26 AM, Miroslav Lichvar  wrote:
>> On Wed, Sep 02, 2015 at 04:16:00PM -0700, John Stultz wrote:
>>> On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves  wrote:
>>> > And just installing chrony from the feeds. With any kernel from 3.17
>>> > you'll have wrong estimates at chronyc sourcestats.
>>>
>>> Wrong estimates? Could you be more specific about what the failure
>>> you're seeing is here? The
>>>
>>> I installed the image above, which comes with a 4.1.6 kernel, and
>>> chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
>>> internet fairly quickly (at least according to chronyc tracking).
>>
>> To see the bug with chronyd the initial offset shouldn't be very close
>> to zero, so it's forced to correct the offset by adjusting the
>> frequency in a larger step.
>>
>> I'm attaching a simple C program that prints the frequency offset
>> as measured between the REALTIME and MONOTONIC_RAW clocks when the
>> adjtimex tick is set to 9000. It should show values close to -10
>> ppm and I suspect on the BBB it will be much smaller.
>
> So I spent some time on this late last night and this afternoon.
>
> It was a little odd because things don't seem totally broken, but
> something isn't quite right.
>
> Digging around it seems the iterative logrithmic approximation done in
> timekeeping_freqadjust() wasn't working right. Instead of making
> smaller order alternating positive and negative adjustments, it was
> doing strange growing adjustments for the same value that wern't large
> enough to actually correct things very quickly. This made it much
> slower to adapt to specified frequency values.
>
> The odd bit, is it seems to come down to:
> tick_error = abs(tick_error);
>
> Haven't chased down why yet, but apparently abs() isn't doing what one
> would think when passed a s64 value.

Well.. chasing it down wasn't hard.. from include/linux/kernel.h:
/*
 * abs() handles unsigned and signed longs, ints, shorts and chars.  For all
 * input types abs() returns a signed long.
 * abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()
 * for those.
 */

Ouch.
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-04 Thread John Stultz
On Thu, Sep 3, 2015 at 4:26 AM, Miroslav Lichvar  wrote:
> On Wed, Sep 02, 2015 at 04:16:00PM -0700, John Stultz wrote:
>> On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves  wrote:
>> > And just installing chrony from the feeds. With any kernel from 3.17
>> > you'll have wrong estimates at chronyc sourcestats.
>>
>> Wrong estimates? Could you be more specific about what the failure
>> you're seeing is here? The
>>
>> I installed the image above, which comes with a 4.1.6 kernel, and
>> chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
>> internet fairly quickly (at least according to chronyc tracking).
>
> To see the bug with chronyd the initial offset shouldn't be very close
> to zero, so it's forced to correct the offset by adjusting the
> frequency in a larger step.
>
> I'm attaching a simple C program that prints the frequency offset
> as measured between the REALTIME and MONOTONIC_RAW clocks when the
> adjtimex tick is set to 9000. It should show values close to -10
> ppm and I suspect on the BBB it will be much smaller.

So I spent some time on this late last night and this afternoon.

It was a little odd because things don't seem totally broken, but
something isn't quite right.

Digging around it seems the iterative logrithmic approximation done in
timekeeping_freqadjust() wasn't working right. Instead of making
smaller order alternating positive and negative adjustments, it was
doing strange growing adjustments for the same value that wern't large
enough to actually correct things very quickly. This made it much
slower to adapt to specified frequency values.

The odd bit, is it seems to come down to:
tick_error = abs(tick_error);

Haven't chased down why yet, but apparently abs() isn't doing what one
would think when passed a s64 value.

Anyway, the attached patch seems to improve things for me. If you can
confirm it resolves things for you I'll run it through some additional
testing after the (long holiday) weekend is over and try to get the
fix pushed out.

Thanks again for the issue report!
-john
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index bca3667..81dc975 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1586,7 +1586,7 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
 	s64 interval = tk->cycle_interval;
 	s64 xinterval = tk->xtime_interval;
 	s64 tick_error;
-	bool negative;
+	bool negative = 0;
 	u32 adj;
 
 	/* Remove any current error adj from freq calculation */
@@ -1604,10 +1604,12 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
 		return;
 
 	/* preserve the direction of correction */
-	negative = (tick_error < 0);
+	if (tick_error < 0) {
+		tick_error = -tick_error;
+		negative = 1;
+	}
 
 	/* Sort out the magnitude of the correction */
-	tick_error = abs(tick_error);
 	for (adj = 0; tick_error > interval; adj++)
 		tick_error >>= 1;
 


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-04 Thread John Stultz
On Thu, Sep 3, 2015 at 4:26 AM, Miroslav Lichvar  wrote:
> On Wed, Sep 02, 2015 at 04:16:00PM -0700, John Stultz wrote:
>> On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves  wrote:
>> > And just installing chrony from the feeds. With any kernel from 3.17
>> > you'll have wrong estimates at chronyc sourcestats.
>>
>> Wrong estimates? Could you be more specific about what the failure
>> you're seeing is here? The
>>
>> I installed the image above, which comes with a 4.1.6 kernel, and
>> chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
>> internet fairly quickly (at least according to chronyc tracking).
>
> To see the bug with chronyd the initial offset shouldn't be very close
> to zero, so it's forced to correct the offset by adjusting the
> frequency in a larger step.
>
> I'm attaching a simple C program that prints the frequency offset
> as measured between the REALTIME and MONOTONIC_RAW clocks when the
> adjtimex tick is set to 9000. It should show values close to -10
> ppm and I suspect on the BBB it will be much smaller.

So I spent some time on this late last night and this afternoon.

It was a little odd because things don't seem totally broken, but
something isn't quite right.

Digging around it seems the iterative logrithmic approximation done in
timekeeping_freqadjust() wasn't working right. Instead of making
smaller order alternating positive and negative adjustments, it was
doing strange growing adjustments for the same value that wern't large
enough to actually correct things very quickly. This made it much
slower to adapt to specified frequency values.

The odd bit, is it seems to come down to:
tick_error = abs(tick_error);

Haven't chased down why yet, but apparently abs() isn't doing what one
would think when passed a s64 value.

Anyway, the attached patch seems to improve things for me. If you can
confirm it resolves things for you I'll run it through some additional
testing after the (long holiday) weekend is over and try to get the
fix pushed out.

Thanks again for the issue report!
-john
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index bca3667..81dc975 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1586,7 +1586,7 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
 	s64 interval = tk->cycle_interval;
 	s64 xinterval = tk->xtime_interval;
 	s64 tick_error;
-	bool negative;
+	bool negative = 0;
 	u32 adj;
 
 	/* Remove any current error adj from freq calculation */
@@ -1604,10 +1604,12 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
 		return;
 
 	/* preserve the direction of correction */
-	negative = (tick_error < 0);
+	if (tick_error < 0) {
+		tick_error = -tick_error;
+		negative = 1;
+	}
 
 	/* Sort out the magnitude of the correction */
-	tick_error = abs(tick_error);
 	for (adj = 0; tick_error > interval; adj++)
 		tick_error >>= 1;
 


defects for uses of abs(u64) (was: Re: Regression: can't apply frequency offsets above 1000ppm)

2015-09-04 Thread Joe Perches
On Fri, 2015-09-04 at 18:00 -0700, John Stultz wrote:
> On Fri, Sep 4, 2015 at 5:57 PM, John Stultz  wrote:
> > On Thu, Sep 3, 2015 at 4:26 AM, Miroslav Lichvar  
> > wrote:
> >> On Wed, Sep 02, 2015 at 04:16:00PM -0700, John Stultz wrote:
> >>> On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves  wrote:
> >>> > And just installing chrony from the feeds. With any kernel from 3.17
> >>> > you'll have wrong estimates at chronyc sourcestats.
> >>>
> >>> Wrong estimates? Could you be more specific about what the failure
> >>> you're seeing is here? The
> >>>
> >>> I installed the image above, which comes with a 4.1.6 kernel, and
> >>> chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
> >>> internet fairly quickly (at least according to chronyc tracking).
> >>
> >> To see the bug with chronyd the initial offset shouldn't be very close
> >> to zero, so it's forced to correct the offset by adjusting the
> >> frequency in a larger step.
> >>
> >> I'm attaching a simple C program that prints the frequency offset
> >> as measured between the REALTIME and MONOTONIC_RAW clocks when the
> >> adjtimex tick is set to 9000. It should show values close to -10
> >> ppm and I suspect on the BBB it will be much smaller.
> >
> > So I spent some time on this late last night and this afternoon.
> >
> > It was a little odd because things don't seem totally broken, but
> > something isn't quite right.
> >
> > Digging around it seems the iterative logrithmic approximation done in
> > timekeeping_freqadjust() wasn't working right. Instead of making
> > smaller order alternating positive and negative adjustments, it was
> > doing strange growing adjustments for the same value that wern't large
> > enough to actually correct things very quickly. This made it much
> > slower to adapt to specified frequency values.
> >
> > The odd bit, is it seems to come down to:
> > tick_error = abs(tick_error);
> >
> > Haven't chased down why yet, but apparently abs() isn't doing what one
> > would think when passed a s64 value.
> 
> Well.. chasing it down wasn't hard.. from include/linux/kernel.h:
> /*
>  * abs() handles unsigned and signed longs, ints, shorts and chars.  For all
>  * input types abs() returns a signed long.
>  * abs() should not be used for 64-bit types (s64, u64, long long) - use 
> abs64()
>  * for those.
>  */
> 
> Ouch.

Here's a little cocci script that finds more of these in:

lib/percpu_counter.c
drivers/input/joystick/walkera0701.c
drivers/md/raid5.c
drivers/spi/spi-pxa2xx.c
fs/f2fs/debug.c

$ cat abs.cocci
@@
u64 t;
@@

*   abs(t)

@@
s64 t;
@@

*   abs(t)

@@
long long t;
@@

*   abs(t)

@@
unsigned long long t;
@@

*   abs(t)

@@
uint64_t t;
@@

*   abs(t)

@@
int64_t t;
@@

*   abs(t)

$

diff -u -p ./lib/percpu_counter.c /tmp/nothing/lib/percpu_counter.c
--- ./lib/percpu_counter.c
+++ /tmp/nothing/lib/percpu_counter.c
@@ -203,7 +203,6 @@ int __percpu_counter_compare(struct perc
 
count = percpu_counter_read(fbc);
/* Check to see if rough count will be sufficient for comparison */
-   if (abs(count - rhs) > (batch * num_online_cpus())) {
if (count > rhs)
return 1;
else

diff -u -p ./drivers/input/joystick/walkera0701.c 
/tmp/nothing/drivers/input/joystick/walkera0701.c
--- ./drivers/input/joystick/walkera0701.c
+++ /tmp/nothing/drivers/input/joystick/walkera0701.c
@@ -150,7 +150,6 @@ static void walkera0701_irq_handler(void
if (w->counter == 24) { /* full frame */
walkera0701_parse_frame(w);
w->counter = NO_SYNC;
-   if (abs(pulse_time - SYNC_PULSE) < RESERVE) /* new 
frame sync */
w->counter = 0;
} else {
if ((pulse_time > (ANALOG_MIN_PULSE - RESERVE)
@@ -161,7 +160,6 @@ static void walkera0701_irq_handler(void
} else
w->counter = NO_SYNC;
}
-   } else if (abs(pulse_time - SYNC_PULSE - BIN0_PULSE) <
RESERVE + BIN1_PULSE - BIN0_PULSE)  /* 
frame sync .. */
w->counter = 0;

diff -u -p ./drivers/md/raid5.c /tmp/nothing/drivers/md/raid5.c
--- ./drivers/md/raid5.c
+++ /tmp/nothing/drivers/md/raid5.c
@@ -6701,8 +6701,6 @@ static int run(struct mddev *mddev)
 * readonly mode so it can take control before
 * allowing any writes.  So just check for that.
 */
-   if (abs(min_offset_diff) >= mddev->chunk_sectors &&
-   abs(min_offset_diff) >= mddev->new_chunk_sectors)
/* not really in-place - so OK */;
else if (mddev->ro == 0) {
printk(KERN_ERR "md/raid:%s: 

Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-04 Thread John Stultz
On Fri, Sep 4, 2015 at 5:57 PM, John Stultz  wrote:
> On Thu, Sep 3, 2015 at 4:26 AM, Miroslav Lichvar  wrote:
>> On Wed, Sep 02, 2015 at 04:16:00PM -0700, John Stultz wrote:
>>> On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves  wrote:
>>> > And just installing chrony from the feeds. With any kernel from 3.17
>>> > you'll have wrong estimates at chronyc sourcestats.
>>>
>>> Wrong estimates? Could you be more specific about what the failure
>>> you're seeing is here? The
>>>
>>> I installed the image above, which comes with a 4.1.6 kernel, and
>>> chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
>>> internet fairly quickly (at least according to chronyc tracking).
>>
>> To see the bug with chronyd the initial offset shouldn't be very close
>> to zero, so it's forced to correct the offset by adjusting the
>> frequency in a larger step.
>>
>> I'm attaching a simple C program that prints the frequency offset
>> as measured between the REALTIME and MONOTONIC_RAW clocks when the
>> adjtimex tick is set to 9000. It should show values close to -10
>> ppm and I suspect on the BBB it will be much smaller.
>
> So I spent some time on this late last night and this afternoon.
>
> It was a little odd because things don't seem totally broken, but
> something isn't quite right.
>
> Digging around it seems the iterative logrithmic approximation done in
> timekeeping_freqadjust() wasn't working right. Instead of making
> smaller order alternating positive and negative adjustments, it was
> doing strange growing adjustments for the same value that wern't large
> enough to actually correct things very quickly. This made it much
> slower to adapt to specified frequency values.
>
> The odd bit, is it seems to come down to:
> tick_error = abs(tick_error);
>
> Haven't chased down why yet, but apparently abs() isn't doing what one
> would think when passed a s64 value.

Well.. chasing it down wasn't hard.. from include/linux/kernel.h:
/*
 * abs() handles unsigned and signed longs, ints, shorts and chars.  For all
 * input types abs() returns a signed long.
 * abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()
 * for those.
 */

Ouch.
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-03 Thread Miroslav Lichvar
On Wed, Sep 02, 2015 at 04:16:00PM -0700, John Stultz wrote:
> On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves  wrote:
> > And just installing chrony from the feeds. With any kernel from 3.17
> > you'll have wrong estimates at chronyc sourcestats.
> 
> Wrong estimates? Could you be more specific about what the failure
> you're seeing is here? The
> 
> I installed the image above, which comes with a 4.1.6 kernel, and
> chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
> internet fairly quickly (at least according to chronyc tracking).

To see the bug with chronyd the initial offset shouldn't be very close
to zero, so it's forced to correct the offset by adjusting the
frequency in a larger step.

I'm attaching a simple C program that prints the frequency offset
as measured between the REALTIME and MONOTONIC_RAW clocks when the
adjtimex tick is set to 9000. It should show values close to -10
ppm and I suspect on the BBB it will be much smaller.

-- 
Miroslav Lichvar
#include 
#include 
#include 
#include 

#define DIFFTS(a, b) ((a.tv_sec - b.tv_sec) + 1e-9 * (a.tv_nsec - b.tv_nsec))

int main() {
struct timespec ts1, ts2, ts3, ts4 = {0};
struct timex t;

t.modes = ADJ_TICK;
t.tick = 9000;
if (adjtimex() < 0)
return 1;

while (1) {
clock_gettime(CLOCK_REALTIME, );
clock_gettime(CLOCK_MONOTONIC_RAW, );

if (ts4.tv_sec)
printf("freq offset = %.0f ppm\n",
(DIFFTS(ts1, ts3) / DIFFTS(ts2, ts4) - 1.0) * 
1e6);

ts3 = ts1;
ts4 = ts2;
usleep(100);
}

return 0;
}


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-03 Thread Nuno Gonçalves
Sorry,

The default chrony config from the debian package can't bring the
clock in sync because it doesn't do steps, and it starts 15 years in
the past. You had some other external help.

Assuming "all factory config" just make sure to disable timesyncd:

sudo systemctl disable systemd-timesyncd

And then reboot in a clean chrony state:

sudo systemctl stop chrony && sudo rm /var/lib/chrony/chrony.drift &&
sudo reboot

You can also just cycle power to get back to year 2000, since the
beaglebone doesn't have a battery backed RTC.

After boot you can see this frequency estimates are allways very large
(usually from +-10.000...80.000):

debian@bbb1:~$ chronyc sourcestats
210 Number of sources = 4
Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==
ftp.claranet.pt 5   3 8 -40146.551 130300  -2720ms61ms
a88-157-128-22.cpe.netcab   5   3 8 -35628.008 129339  -2393ms68ms
a212-113-190-2.cpe.netcab   5   3 8 -45213.773 169140  -3057ms63ms
mirrors.dominios.pt 5   3 8 -39615.191  5.367  -2694ms53ms

To have it perform normally try for example 3.16.3:

sudo apt-get install linux-image-3.16.3-bone6

Thanks,
Nuno

On Thu, Sep 3, 2015 at 12:16 AM, John Stultz  wrote:
> On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves  wrote:
>> On Wed, Sep 2, 2015 at 2:03 AM, John Stultz  wrote:
>>> On Tue, Sep 1, 2015 at 5:36 PM, Nuno Gonçalves  wrote:
 On Tue, Sep 1, 2015 at 9:25 PM, Thomas Gleixner  wrote:
> On Tue, 1 Sep 2015, Nuno Gonçalves wrote:
>
>> There is a regression on the clock system since v3.16-rc5-111-g4396e05
>> [1],
>
>> [1] 
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4396e058c52e167729729cf64ea3dfa229637086
>
> That commit has absolutely nothing to do with NTP. I fear your bisect
> went down the wrong road somewhere.

 You are right. It is v3.16-rc5-114-gdc49159:

 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dc491596f6394382fbc74ad331156207d619fa0a

 I've triple checked it this time. Not sure where I did the mistake to
 get it wrong by 3 commits.
>>>
>>> This commit is much more believable (though surprising as that change
>>> was found to greatly improve results for most uses).
>>>
>>> Can you provide any more details about how the problem is reproduced
>>> (kernel config, what userland images are you using, etc)?  I've got a
>>> BBB myself so I can try to see whats going on.
>>>
>>> thanks
>>> -john
>>
>> I'm using a clean Debian image:
>>
>> https://rcn-ee.com/rootfs/bb.org/testing/2015-08-31/console/bone-debian-8.1-console-armhf-2015-08-31-2gb.img.xz
>>
>> And just installing chrony from the feeds. With any kernel from 3.17
>> you'll have wrong estimates at chronyc sourcestats.
>
> Wrong estimates? Could you be more specific about what the failure
> you're seeing is here? The
>
> I installed the image above, which comes with a 4.1.6 kernel, and
> chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
> internet fairly quickly (at least according to chronyc tracking).
>
> root@beaglebone:~# chronyc tracking
> Reference ID: 198.110.48.12 (time01.muskegonisd.org)
> Stratum : 3
> Ref time (UTC)  : Wed Sep  2 23:07:05 2015
> System time : 0.001320852 seconds fast of NTP time
> Last offset : +0.001209910 seconds
> RMS offset  : 0.002978454 seconds
> Frequency   : 44.684 ppm fast
> Residual freq   : +0.068 ppm
> Skew: 1.223 ppm
> Root delay  : 0.073661 seconds
> Root dispersion : 0.021902 seconds
> Update interval : 518.3 seconds
> Leap status : Normal
> root@beaglebone:~# chronyc sourcestats
> 210 Number of sources = 4
> Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
> ==
> 172.82.134.51   4   3   17m +1.593 30.515   +798us   802us
> unlawful.id.au  6   5   21m +1.080  2.312  +2731us   273us
> time.theplante.net 11   5   77m -0.536  0.915  -1165us  1044us
> time01.muskegonisd.org  4   3   25m +1.677 15.256   -342us   516us
>
>
> Can you send me your kernel config?
>
> thanks
> -john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-03 Thread Nuno Gonçalves
Sorry,

The default chrony config from the debian package can't bring the
clock in sync because it doesn't do steps, and it starts 15 years in
the past. You had some other external help.

Assuming "all factory config" just make sure to disable timesyncd:

sudo systemctl disable systemd-timesyncd

And then reboot in a clean chrony state:

sudo systemctl stop chrony && sudo rm /var/lib/chrony/chrony.drift &&
sudo reboot

You can also just cycle power to get back to year 2000, since the
beaglebone doesn't have a battery backed RTC.

After boot you can see this frequency estimates are allways very large
(usually from +-10.000...80.000):

debian@bbb1:~$ chronyc sourcestats
210 Number of sources = 4
Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==
ftp.claranet.pt 5   3 8 -40146.551 130300  -2720ms61ms
a88-157-128-22.cpe.netcab   5   3 8 -35628.008 129339  -2393ms68ms
a212-113-190-2.cpe.netcab   5   3 8 -45213.773 169140  -3057ms63ms
mirrors.dominios.pt 5   3 8 -39615.191  5.367  -2694ms53ms

To have it perform normally try for example 3.16.3:

sudo apt-get install linux-image-3.16.3-bone6

Thanks,
Nuno

On Thu, Sep 3, 2015 at 12:16 AM, John Stultz  wrote:
> On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves  wrote:
>> On Wed, Sep 2, 2015 at 2:03 AM, John Stultz  wrote:
>>> On Tue, Sep 1, 2015 at 5:36 PM, Nuno Gonçalves  wrote:
 On Tue, Sep 1, 2015 at 9:25 PM, Thomas Gleixner  wrote:
> On Tue, 1 Sep 2015, Nuno Gonçalves wrote:
>
>> There is a regression on the clock system since v3.16-rc5-111-g4396e05
>> [1],
>
>> [1] 
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4396e058c52e167729729cf64ea3dfa229637086
>
> That commit has absolutely nothing to do with NTP. I fear your bisect
> went down the wrong road somewhere.

 You are right. It is v3.16-rc5-114-gdc49159:

 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dc491596f6394382fbc74ad331156207d619fa0a

 I've triple checked it this time. Not sure where I did the mistake to
 get it wrong by 3 commits.
>>>
>>> This commit is much more believable (though surprising as that change
>>> was found to greatly improve results for most uses).
>>>
>>> Can you provide any more details about how the problem is reproduced
>>> (kernel config, what userland images are you using, etc)?  I've got a
>>> BBB myself so I can try to see whats going on.
>>>
>>> thanks
>>> -john
>>
>> I'm using a clean Debian image:
>>
>> https://rcn-ee.com/rootfs/bb.org/testing/2015-08-31/console/bone-debian-8.1-console-armhf-2015-08-31-2gb.img.xz
>>
>> And just installing chrony from the feeds. With any kernel from 3.17
>> you'll have wrong estimates at chronyc sourcestats.
>
> Wrong estimates? Could you be more specific about what the failure
> you're seeing is here? The
>
> I installed the image above, which comes with a 4.1.6 kernel, and
> chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
> internet fairly quickly (at least according to chronyc tracking).
>
> root@beaglebone:~# chronyc tracking
> Reference ID: 198.110.48.12 (time01.muskegonisd.org)
> Stratum : 3
> Ref time (UTC)  : Wed Sep  2 23:07:05 2015
> System time : 0.001320852 seconds fast of NTP time
> Last offset : +0.001209910 seconds
> RMS offset  : 0.002978454 seconds
> Frequency   : 44.684 ppm fast
> Residual freq   : +0.068 ppm
> Skew: 1.223 ppm
> Root delay  : 0.073661 seconds
> Root dispersion : 0.021902 seconds
> Update interval : 518.3 seconds
> Leap status : Normal
> root@beaglebone:~# chronyc sourcestats
> 210 Number of sources = 4
> Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
> ==
> 172.82.134.51   4   3   17m +1.593 30.515   +798us   802us
> unlawful.id.au  6   5   21m +1.080  2.312  +2731us   273us
> time.theplante.net 11   5   77m -0.536  0.915  -1165us  1044us
> time01.muskegonisd.org  4   3   25m +1.677 15.256   -342us   516us
>
>
> Can you send me your kernel config?
>
> thanks
> -john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-03 Thread Miroslav Lichvar
On Wed, Sep 02, 2015 at 04:16:00PM -0700, John Stultz wrote:
> On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves  wrote:
> > And just installing chrony from the feeds. With any kernel from 3.17
> > you'll have wrong estimates at chronyc sourcestats.
> 
> Wrong estimates? Could you be more specific about what the failure
> you're seeing is here? The
> 
> I installed the image above, which comes with a 4.1.6 kernel, and
> chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
> internet fairly quickly (at least according to chronyc tracking).

To see the bug with chronyd the initial offset shouldn't be very close
to zero, so it's forced to correct the offset by adjusting the
frequency in a larger step.

I'm attaching a simple C program that prints the frequency offset
as measured between the REALTIME and MONOTONIC_RAW clocks when the
adjtimex tick is set to 9000. It should show values close to -10
ppm and I suspect on the BBB it will be much smaller.

-- 
Miroslav Lichvar
#include 
#include 
#include 
#include 

#define DIFFTS(a, b) ((a.tv_sec - b.tv_sec) + 1e-9 * (a.tv_nsec - b.tv_nsec))

int main() {
struct timespec ts1, ts2, ts3, ts4 = {0};
struct timex t;

t.modes = ADJ_TICK;
t.tick = 9000;
if (adjtimex() < 0)
return 1;

while (1) {
clock_gettime(CLOCK_REALTIME, );
clock_gettime(CLOCK_MONOTONIC_RAW, );

if (ts4.tv_sec)
printf("freq offset = %.0f ppm\n",
(DIFFTS(ts1, ts3) / DIFFTS(ts2, ts4) - 1.0) * 
1e6);

ts3 = ts1;
ts4 = ts2;
usleep(100);
}

return 0;
}


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-02 Thread John Stultz
On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves  wrote:
> On Wed, Sep 2, 2015 at 2:03 AM, John Stultz  wrote:
>> On Tue, Sep 1, 2015 at 5:36 PM, Nuno Gonçalves  wrote:
>>> On Tue, Sep 1, 2015 at 9:25 PM, Thomas Gleixner  wrote:
 On Tue, 1 Sep 2015, Nuno Gonçalves wrote:

> There is a regression on the clock system since v3.16-rc5-111-g4396e05
> [1],

> [1] 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4396e058c52e167729729cf64ea3dfa229637086

 That commit has absolutely nothing to do with NTP. I fear your bisect
 went down the wrong road somewhere.
>>>
>>> You are right. It is v3.16-rc5-114-gdc49159:
>>>
>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dc491596f6394382fbc74ad331156207d619fa0a
>>>
>>> I've triple checked it this time. Not sure where I did the mistake to
>>> get it wrong by 3 commits.
>>
>> This commit is much more believable (though surprising as that change
>> was found to greatly improve results for most uses).
>>
>> Can you provide any more details about how the problem is reproduced
>> (kernel config, what userland images are you using, etc)?  I've got a
>> BBB myself so I can try to see whats going on.
>>
>> thanks
>> -john
>
> I'm using a clean Debian image:
>
> https://rcn-ee.com/rootfs/bb.org/testing/2015-08-31/console/bone-debian-8.1-console-armhf-2015-08-31-2gb.img.xz
>
> And just installing chrony from the feeds. With any kernel from 3.17
> you'll have wrong estimates at chronyc sourcestats.

Wrong estimates? Could you be more specific about what the failure
you're seeing is here? The

I installed the image above, which comes with a 4.1.6 kernel, and
chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
internet fairly quickly (at least according to chronyc tracking).

root@beaglebone:~# chronyc tracking
Reference ID: 198.110.48.12 (time01.muskegonisd.org)
Stratum : 3
Ref time (UTC)  : Wed Sep  2 23:07:05 2015
System time : 0.001320852 seconds fast of NTP time
Last offset : +0.001209910 seconds
RMS offset  : 0.002978454 seconds
Frequency   : 44.684 ppm fast
Residual freq   : +0.068 ppm
Skew: 1.223 ppm
Root delay  : 0.073661 seconds
Root dispersion : 0.021902 seconds
Update interval : 518.3 seconds
Leap status : Normal
root@beaglebone:~# chronyc sourcestats
210 Number of sources = 4
Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==
172.82.134.51   4   3   17m +1.593 30.515   +798us   802us
unlawful.id.au  6   5   21m +1.080  2.312  +2731us   273us
time.theplante.net 11   5   77m -0.536  0.915  -1165us  1044us
time01.muskegonisd.org  4   3   25m +1.677 15.256   -342us   516us


Can you send me your kernel config?

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-02 Thread Miroslav Lichvar
On Wed, Sep 02, 2015 at 02:14:21AM +0100, Nuno Gonçalves wrote:
> On Wed, Sep 2, 2015 at 2:03 AM, John Stultz  wrote:
> > On Tue, Sep 1, 2015 at 5:36 PM, Nuno Gonçalves  wrote:
> >> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dc491596f6394382fbc74ad331156207d619fa0a
> >>
> >> I've triple checked it this time. Not sure where I did the mistake to
> >> get it wrong by 3 commits.
> >
> > This commit is much more believable (though surprising as that change
> > was found to greatly improve results for most uses).
> >
> > Can you provide any more details about how the problem is reproduced
> > (kernel config, what userland images are you using, etc)?  I've got a
> > BBB myself so I can try to see whats going on.

> And just installing chrony from the feeds. With any kernel from 3.17
> you'll have wrong estimates at chronyc sourcestats.

Another reproducer is to disable chronyd, set the adjtimex tick to
9000 (e.g. by the adjtimex utility) and observe how is the offset of
the clock changing over time, e.g. by running periodically ntpdate -q
or chronyd -Q. It should be losing about 0.1 second per second, but
the actual frequency offset seems to be much smaller.

> Miroslav also dismissed this being related to nohz after some tests.

Yeah, the problem didn't disappear when the kernel was booted with
nohz=off, so I thought it was something else. Now that it seems it
indeed is related to nohz, I guess it's not a problem with the clock
update interval being too long (which the referenced commit
addressed).

Anyone knows what values (mask, mult, shift, maxadj) has the
clocksource in this case? I'd like to try to reproduce the problem in
the simulator.

-- 
Miroslav Lichvar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-02 Thread Miroslav Lichvar
On Wed, Sep 02, 2015 at 02:14:21AM +0100, Nuno Gonçalves wrote:
> On Wed, Sep 2, 2015 at 2:03 AM, John Stultz  wrote:
> > On Tue, Sep 1, 2015 at 5:36 PM, Nuno Gonçalves  wrote:
> >> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dc491596f6394382fbc74ad331156207d619fa0a
> >>
> >> I've triple checked it this time. Not sure where I did the mistake to
> >> get it wrong by 3 commits.
> >
> > This commit is much more believable (though surprising as that change
> > was found to greatly improve results for most uses).
> >
> > Can you provide any more details about how the problem is reproduced
> > (kernel config, what userland images are you using, etc)?  I've got a
> > BBB myself so I can try to see whats going on.

> And just installing chrony from the feeds. With any kernel from 3.17
> you'll have wrong estimates at chronyc sourcestats.

Another reproducer is to disable chronyd, set the adjtimex tick to
9000 (e.g. by the adjtimex utility) and observe how is the offset of
the clock changing over time, e.g. by running periodically ntpdate -q
or chronyd -Q. It should be losing about 0.1 second per second, but
the actual frequency offset seems to be much smaller.

> Miroslav also dismissed this being related to nohz after some tests.

Yeah, the problem didn't disappear when the kernel was booted with
nohz=off, so I thought it was something else. Now that it seems it
indeed is related to nohz, I guess it's not a problem with the clock
update interval being too long (which the referenced commit
addressed).

Anyone knows what values (mask, mult, shift, maxadj) has the
clocksource in this case? I'd like to try to reproduce the problem in
the simulator.

-- 
Miroslav Lichvar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-02 Thread John Stultz
On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves  wrote:
> On Wed, Sep 2, 2015 at 2:03 AM, John Stultz  wrote:
>> On Tue, Sep 1, 2015 at 5:36 PM, Nuno Gonçalves  wrote:
>>> On Tue, Sep 1, 2015 at 9:25 PM, Thomas Gleixner  wrote:
 On Tue, 1 Sep 2015, Nuno Gonçalves wrote:

> There is a regression on the clock system since v3.16-rc5-111-g4396e05
> [1],

> [1] 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4396e058c52e167729729cf64ea3dfa229637086

 That commit has absolutely nothing to do with NTP. I fear your bisect
 went down the wrong road somewhere.
>>>
>>> You are right. It is v3.16-rc5-114-gdc49159:
>>>
>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dc491596f6394382fbc74ad331156207d619fa0a
>>>
>>> I've triple checked it this time. Not sure where I did the mistake to
>>> get it wrong by 3 commits.
>>
>> This commit is much more believable (though surprising as that change
>> was found to greatly improve results for most uses).
>>
>> Can you provide any more details about how the problem is reproduced
>> (kernel config, what userland images are you using, etc)?  I've got a
>> BBB myself so I can try to see whats going on.
>>
>> thanks
>> -john
>
> I'm using a clean Debian image:
>
> https://rcn-ee.com/rootfs/bb.org/testing/2015-08-31/console/bone-debian-8.1-console-armhf-2015-08-31-2gb.img.xz
>
> And just installing chrony from the feeds. With any kernel from 3.17
> you'll have wrong estimates at chronyc sourcestats.

Wrong estimates? Could you be more specific about what the failure
you're seeing is here? The

I installed the image above, which comes with a 4.1.6 kernel, and
chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
internet fairly quickly (at least according to chronyc tracking).

root@beaglebone:~# chronyc tracking
Reference ID: 198.110.48.12 (time01.muskegonisd.org)
Stratum : 3
Ref time (UTC)  : Wed Sep  2 23:07:05 2015
System time : 0.001320852 seconds fast of NTP time
Last offset : +0.001209910 seconds
RMS offset  : 0.002978454 seconds
Frequency   : 44.684 ppm fast
Residual freq   : +0.068 ppm
Skew: 1.223 ppm
Root delay  : 0.073661 seconds
Root dispersion : 0.021902 seconds
Update interval : 518.3 seconds
Leap status : Normal
root@beaglebone:~# chronyc sourcestats
210 Number of sources = 4
Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==
172.82.134.51   4   3   17m +1.593 30.515   +798us   802us
unlawful.id.au  6   5   21m +1.080  2.312  +2731us   273us
time.theplante.net 11   5   77m -0.536  0.915  -1165us  1044us
time01.muskegonisd.org  4   3   25m +1.677 15.256   -342us   516us


Can you send me your kernel config?

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-01 Thread Nuno Gonçalves
On Wed, Sep 2, 2015 at 2:03 AM, John Stultz  wrote:
> On Tue, Sep 1, 2015 at 5:36 PM, Nuno Gonçalves  wrote:
>> On Tue, Sep 1, 2015 at 9:25 PM, Thomas Gleixner  wrote:
>>> On Tue, 1 Sep 2015, Nuno Gonçalves wrote:
>>>
 There is a regression on the clock system since v3.16-rc5-111-g4396e05
 [1],
>>>
 [1] 
 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4396e058c52e167729729cf64ea3dfa229637086
>>>
>>> That commit has absolutely nothing to do with NTP. I fear your bisect
>>> went down the wrong road somewhere.
>>
>> You are right. It is v3.16-rc5-114-gdc49159:
>>
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dc491596f6394382fbc74ad331156207d619fa0a
>>
>> I've triple checked it this time. Not sure where I did the mistake to
>> get it wrong by 3 commits.
>
> This commit is much more believable (though surprising as that change
> was found to greatly improve results for most uses).
>
> Can you provide any more details about how the problem is reproduced
> (kernel config, what userland images are you using, etc)?  I've got a
> BBB myself so I can try to see whats going on.
>
> thanks
> -john

I'm using a clean Debian image:

https://rcn-ee.com/rootfs/bb.org/testing/2015-08-31/console/bone-debian-8.1-console-armhf-2015-08-31-2gb.img.xz

And just installing chrony from the feeds. With any kernel from 3.17
you'll have wrong estimates at chronyc sourcestats.

Miroslav did some tests at a beaglebone I set for him, according to my
initial post.

Miroslav also dismissed this being related to nohz after some tests.

Thanks,
Nuno
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-01 Thread John Stultz
On Tue, Sep 1, 2015 at 5:36 PM, Nuno Gonçalves  wrote:
> On Tue, Sep 1, 2015 at 9:25 PM, Thomas Gleixner  wrote:
>> On Tue, 1 Sep 2015, Nuno Gonçalves wrote:
>>
>>> There is a regression on the clock system since v3.16-rc5-111-g4396e05
>>> [1],
>>
>>> [1] 
>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4396e058c52e167729729cf64ea3dfa229637086
>>
>> That commit has absolutely nothing to do with NTP. I fear your bisect
>> went down the wrong road somewhere.
>
> You are right. It is v3.16-rc5-114-gdc49159:
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dc491596f6394382fbc74ad331156207d619fa0a
>
> I've triple checked it this time. Not sure where I did the mistake to
> get it wrong by 3 commits.

This commit is much more believable (though surprising as that change
was found to greatly improve results for most uses).

Can you provide any more details about how the problem is reproduced
(kernel config, what userland images are you using, etc)?  I've got a
BBB myself so I can try to see whats going on.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-01 Thread Nuno Gonçalves
On Tue, Sep 1, 2015 at 9:25 PM, Thomas Gleixner  wrote:
> On Tue, 1 Sep 2015, Nuno Gonçalves wrote:
>
>> There is a regression on the clock system since v3.16-rc5-111-g4396e05
>> [1],
>
>> [1] 
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4396e058c52e167729729cf64ea3dfa229637086
>
> That commit has absolutely nothing to do with NTP. I fear your bisect
> went down the wrong road somewhere.

You are right. It is v3.16-rc5-114-gdc49159:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dc491596f6394382fbc74ad331156207d619fa0a

I've triple checked it this time. Not sure where I did the mistake to
get it wrong by 3 commits.

Thanks,
Nuno
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-01 Thread Thomas Gleixner
On Tue, 1 Sep 2015, John Stultz wrote:
> On Tue, Sep 1, 2015 at 1:25 PM, Thomas Gleixner  wrote:
> > On Tue, 1 Sep 2015, Nuno Gonçalves wrote:
> >
> >> There is a regression on the clock system since v3.16-rc5-111-g4396e05
> >> [1],
> >
> >> [1] 
> >> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4396e058c52e167729729cf64ea3dfa229637086
> >
> > That commit has absolutely nothing to do with NTP. I fear your bisect
> > went down the wrong road somewhere.
> >
> >> where the clock doesn't apply frequency offsets above about
> >> 1000ppm [2].
> >
> > This looks pretty familiar.
> >
> > The issue was introduced with commit 5e5aeb4367b (time: adjtimex:
> > Validate the ADJ_FREQUENCY values). That patch was tagged for stable,
> > so it got backported.
> >
> > The fix is in commit 29183a70b0b82 (ntp: Fixup adjtimex freq
> > validation on 32-bit systems). That commit was tagged for stable as
> > well, but with the extra '#3.19+' limitation.
> >
> > So in the worst case 5e5aeb4367b hit a stable tree < 3.19, but
> > 29183a70b0b82 did not.
> 
> Hrm. So that would be problematic, and I'll have to follow up that
> both changes got backported together.
> 
> But I don't think that's the issue here, since the problem supposedly
> continues w/ 4.2...

Duh, missed that part.

Thanks,

tglx

Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-01 Thread John Stultz
On Tue, Sep 1, 2015 at 1:25 PM, Thomas Gleixner  wrote:
> On Tue, 1 Sep 2015, Nuno Gonçalves wrote:
>
>> There is a regression on the clock system since v3.16-rc5-111-g4396e05
>> [1],
>
>> [1] 
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4396e058c52e167729729cf64ea3dfa229637086
>
> That commit has absolutely nothing to do with NTP. I fear your bisect
> went down the wrong road somewhere.
>
>> where the clock doesn't apply frequency offsets above about
>> 1000ppm [2].
>
> This looks pretty familiar.
>
> The issue was introduced with commit 5e5aeb4367b (time: adjtimex:
> Validate the ADJ_FREQUENCY values). That patch was tagged for stable,
> so it got backported.
>
> The fix is in commit 29183a70b0b82 (ntp: Fixup adjtimex freq
> validation on 32-bit systems). That commit was tagged for stable as
> well, but with the extra '#3.19+' limitation.
>
> So in the worst case 5e5aeb4367b hit a stable tree < 3.19, but
> 29183a70b0b82 did not.

Hrm. So that would be problematic, and I'll have to follow up that
both changes got backported together.

But I don't think that's the issue here, since the problem supposedly
continues w/ 4.2...

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-01 Thread John Stultz
On Tue, Sep 1, 2015 at 1:02 PM, Nuno Gonçalves  wrote:
> There is a regression on the clock system since v3.16-rc5-111-g4396e05
> [1], where the clock doesn't apply frequency offsets above about
> 1000ppm [2].

??? Hrm. I'm not sure I'm understanding here.

Adjtimex frequency limits have been capped to ~500ppm for a long time
and the commit you pointed to in [1] doesn't seem to be at all related
to the adjtimex interface.

Can you be more specific about which interface and arguments you're
using which are causing troubles?
Miroslav: Do you have further details?

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-01 Thread Thomas Gleixner
On Tue, 1 Sep 2015, Nuno Gonçalves wrote:

> There is a regression on the clock system since v3.16-rc5-111-g4396e05
> [1],

> [1] 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4396e058c52e167729729cf64ea3dfa229637086

That commit has absolutely nothing to do with NTP. I fear your bisect
went down the wrong road somewhere.

> where the clock doesn't apply frequency offsets above about
> 1000ppm [2].

This looks pretty familiar.

The issue was introduced with commit 5e5aeb4367b (time: adjtimex:
Validate the ADJ_FREQUENCY values). That patch was tagged for stable,
so it got backported.

The fix is in commit 29183a70b0b82 (ntp: Fixup adjtimex freq
validation on 32-bit systems). That commit was tagged for stable as
well, but with the extra '#3.19+' limitation.

So in the worst case 5e5aeb4367b hit a stable tree < 3.19, but
29183a70b0b82 did not.

Thanks,

tglx



Regression: can't apply frequency offsets above 1000ppm.

2015-09-01 Thread Nuno Gonçalves
There is a regression on the clock system since v3.16-rc5-111-g4396e05
[1], where the clock doesn't apply frequency offsets above about
1000ppm [2].

This was found at a Beaglebone (armv7l, TI AM335x).

Thanks,
Nuno Goncalves

[1] 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4396e058c52e167729729cf64ea3dfa229637086
[2] 
http://listengine.tuxfamily.org/chrony.tuxfamily.org/chrony-users/2015/08/msg00022.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-01 Thread John Stultz
On Tue, Sep 1, 2015 at 5:36 PM, Nuno Gonçalves  wrote:
> On Tue, Sep 1, 2015 at 9:25 PM, Thomas Gleixner  wrote:
>> On Tue, 1 Sep 2015, Nuno Gonçalves wrote:
>>
>>> There is a regression on the clock system since v3.16-rc5-111-g4396e05
>>> [1],
>>
>>> [1] 
>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4396e058c52e167729729cf64ea3dfa229637086
>>
>> That commit has absolutely nothing to do with NTP. I fear your bisect
>> went down the wrong road somewhere.
>
> You are right. It is v3.16-rc5-114-gdc49159:
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dc491596f6394382fbc74ad331156207d619fa0a
>
> I've triple checked it this time. Not sure where I did the mistake to
> get it wrong by 3 commits.

This commit is much more believable (though surprising as that change
was found to greatly improve results for most uses).

Can you provide any more details about how the problem is reproduced
(kernel config, what userland images are you using, etc)?  I've got a
BBB myself so I can try to see whats going on.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-01 Thread Nuno Gonçalves
On Tue, Sep 1, 2015 at 9:25 PM, Thomas Gleixner  wrote:
> On Tue, 1 Sep 2015, Nuno Gonçalves wrote:
>
>> There is a regression on the clock system since v3.16-rc5-111-g4396e05
>> [1],
>
>> [1] 
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4396e058c52e167729729cf64ea3dfa229637086
>
> That commit has absolutely nothing to do with NTP. I fear your bisect
> went down the wrong road somewhere.

You are right. It is v3.16-rc5-114-gdc49159:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dc491596f6394382fbc74ad331156207d619fa0a

I've triple checked it this time. Not sure where I did the mistake to
get it wrong by 3 commits.

Thanks,
Nuno
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-01 Thread Nuno Gonçalves
On Wed, Sep 2, 2015 at 2:03 AM, John Stultz  wrote:
> On Tue, Sep 1, 2015 at 5:36 PM, Nuno Gonçalves  wrote:
>> On Tue, Sep 1, 2015 at 9:25 PM, Thomas Gleixner  wrote:
>>> On Tue, 1 Sep 2015, Nuno Gonçalves wrote:
>>>
 There is a regression on the clock system since v3.16-rc5-111-g4396e05
 [1],
>>>
 [1] 
 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4396e058c52e167729729cf64ea3dfa229637086
>>>
>>> That commit has absolutely nothing to do with NTP. I fear your bisect
>>> went down the wrong road somewhere.
>>
>> You are right. It is v3.16-rc5-114-gdc49159:
>>
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dc491596f6394382fbc74ad331156207d619fa0a
>>
>> I've triple checked it this time. Not sure where I did the mistake to
>> get it wrong by 3 commits.
>
> This commit is much more believable (though surprising as that change
> was found to greatly improve results for most uses).
>
> Can you provide any more details about how the problem is reproduced
> (kernel config, what userland images are you using, etc)?  I've got a
> BBB myself so I can try to see whats going on.
>
> thanks
> -john

I'm using a clean Debian image:

https://rcn-ee.com/rootfs/bb.org/testing/2015-08-31/console/bone-debian-8.1-console-armhf-2015-08-31-2gb.img.xz

And just installing chrony from the feeds. With any kernel from 3.17
you'll have wrong estimates at chronyc sourcestats.

Miroslav did some tests at a beaglebone I set for him, according to my
initial post.

Miroslav also dismissed this being related to nohz after some tests.

Thanks,
Nuno
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-01 Thread Thomas Gleixner
On Tue, 1 Sep 2015, Nuno Gonçalves wrote:

> There is a regression on the clock system since v3.16-rc5-111-g4396e05
> [1],

> [1] 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4396e058c52e167729729cf64ea3dfa229637086

That commit has absolutely nothing to do with NTP. I fear your bisect
went down the wrong road somewhere.

> where the clock doesn't apply frequency offsets above about
> 1000ppm [2].

This looks pretty familiar.

The issue was introduced with commit 5e5aeb4367b (time: adjtimex:
Validate the ADJ_FREQUENCY values). That patch was tagged for stable,
so it got backported.

The fix is in commit 29183a70b0b82 (ntp: Fixup adjtimex freq
validation on 32-bit systems). That commit was tagged for stable as
well, but with the extra '#3.19+' limitation.

So in the worst case 5e5aeb4367b hit a stable tree < 3.19, but
29183a70b0b82 did not.

Thanks,

tglx



Regression: can't apply frequency offsets above 1000ppm.

2015-09-01 Thread Nuno Gonçalves
There is a regression on the clock system since v3.16-rc5-111-g4396e05
[1], where the clock doesn't apply frequency offsets above about
1000ppm [2].

This was found at a Beaglebone (armv7l, TI AM335x).

Thanks,
Nuno Goncalves

[1] 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4396e058c52e167729729cf64ea3dfa229637086
[2] 
http://listengine.tuxfamily.org/chrony.tuxfamily.org/chrony-users/2015/08/msg00022.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-01 Thread John Stultz
On Tue, Sep 1, 2015 at 1:25 PM, Thomas Gleixner  wrote:
> On Tue, 1 Sep 2015, Nuno Gonçalves wrote:
>
>> There is a regression on the clock system since v3.16-rc5-111-g4396e05
>> [1],
>
>> [1] 
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4396e058c52e167729729cf64ea3dfa229637086
>
> That commit has absolutely nothing to do with NTP. I fear your bisect
> went down the wrong road somewhere.
>
>> where the clock doesn't apply frequency offsets above about
>> 1000ppm [2].
>
> This looks pretty familiar.
>
> The issue was introduced with commit 5e5aeb4367b (time: adjtimex:
> Validate the ADJ_FREQUENCY values). That patch was tagged for stable,
> so it got backported.
>
> The fix is in commit 29183a70b0b82 (ntp: Fixup adjtimex freq
> validation on 32-bit systems). That commit was tagged for stable as
> well, but with the extra '#3.19+' limitation.
>
> So in the worst case 5e5aeb4367b hit a stable tree < 3.19, but
> 29183a70b0b82 did not.

Hrm. So that would be problematic, and I'll have to follow up that
both changes got backported together.

But I don't think that's the issue here, since the problem supposedly
continues w/ 4.2...

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-01 Thread John Stultz
On Tue, Sep 1, 2015 at 1:02 PM, Nuno Gonçalves  wrote:
> There is a regression on the clock system since v3.16-rc5-111-g4396e05
> [1], where the clock doesn't apply frequency offsets above about
> 1000ppm [2].

??? Hrm. I'm not sure I'm understanding here.

Adjtimex frequency limits have been capped to ~500ppm for a long time
and the commit you pointed to in [1] doesn't seem to be at all related
to the adjtimex interface.

Can you be more specific about which interface and arguments you're
using which are causing troubles?
Miroslav: Do you have further details?

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: can't apply frequency offsets above 1000ppm.

2015-09-01 Thread Thomas Gleixner
On Tue, 1 Sep 2015, John Stultz wrote:
> On Tue, Sep 1, 2015 at 1:25 PM, Thomas Gleixner  wrote:
> > On Tue, 1 Sep 2015, Nuno Gonçalves wrote:
> >
> >> There is a regression on the clock system since v3.16-rc5-111-g4396e05
> >> [1],
> >
> >> [1] 
> >> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4396e058c52e167729729cf64ea3dfa229637086
> >
> > That commit has absolutely nothing to do with NTP. I fear your bisect
> > went down the wrong road somewhere.
> >
> >> where the clock doesn't apply frequency offsets above about
> >> 1000ppm [2].
> >
> > This looks pretty familiar.
> >
> > The issue was introduced with commit 5e5aeb4367b (time: adjtimex:
> > Validate the ADJ_FREQUENCY values). That patch was tagged for stable,
> > so it got backported.
> >
> > The fix is in commit 29183a70b0b82 (ntp: Fixup adjtimex freq
> > validation on 32-bit systems). That commit was tagged for stable as
> > well, but with the extra '#3.19+' limitation.
> >
> > So in the worst case 5e5aeb4367b hit a stable tree < 3.19, but
> > 29183a70b0b82 did not.
> 
> Hrm. So that would be problematic, and I'll have to follow up that
> both changes got backported together.
> 
> But I don't think that's the issue here, since the problem supposedly
> continues w/ 4.2...

Duh, missed that part.

Thanks,

tglx