Re: PROBLEM: monotonic clock going backwards on ppc64

2019-03-01 Thread Thomas Gleixner
On Sat, 2 Mar 2019, Michael Ellerman wrote:
> > The problem seems to be in vDSO code in 
> > arch/powerpc/kernel/vdso64/gettimeofday.S.
> 
> You're right, the wall-to-monotonic offset (wtom_clock_sec) is a signed
> 32-bit value, so that seems like it's going to have problems.
> 
> If I do `date -s 2037-1-1` I see:
> 
> [   26.024061] update_vsyscall: tk->wall_to_monotonic.tv_sec -2114341175
> [   26.042633] update_vsyscall: vdso_data->wtom_clock_sec-2114341175
> 
> Which looks sane.
> 
> But then 2040-1-1 shows:
> 
> [   32.617020] update_vsyscall: tk->wall_to_monotonic.tv_sec -2208949168
> [   32.632642] update_vsyscall: vdso_data->wtom_clock_sec 2086018128
> 
> ie. the larger negative offset has overflowed and become positive.
> 
> But then when we go back to 2037 we get a negative offset again and
> monotonic time appears to go backward and things are unhappy.
> 
> I don't know this code well, but the patch below *appears* to work. I'll
> have a closer look on Monday.

Looks about right. The core hands in the 64bit offset already.

Thanks,

tglx




Re: PROBLEM: monotonic clock going backwards on ppc64

2019-03-01 Thread Jakub Drnec
Hi,

>> Decreasing the realtime clock across the y2k38 threshold is one reliable way 
>> to reproduce.
>> Allegedly this can also happen just by running ntpd, I have not managed to 
>> reproduce that other
>> than booting with rtc at >2038 and then running ntp.

> Isn't it the expected behavior. Here is what I see for powermac:
>...gives us a time in the documented 1904..2040 year range. I decided not

This (or some related issue involving monotonic time going wonky) surfaced on a 
system running 4.5 kernel with sane clock values running ntpd. It may not be 
limited to setting the clock beyond 2038, that is just a convenient way to 
reproduce something visibly wrong.

If the "ntpd-related monotonic" issue is separate from what I reported, I did 
not find anything in the commit history that looks likely to have fixed that, 
so I think that is still a thing.

Another way of looking at the issue is that when you set the clock beyond 2038 
(or the mystery ntpd thing happens), vdso and syscall versions of clock_gettime 
start reporting wildly different values for CLOCK_MONOTONIC. If someone manages 
to somehow interlace calls to those two assuming they are consistent, crazy 
bugs will ensue...

Regards,
Jakub


Re: PROBLEM: monotonic clock going backwards on ppc64

2019-03-01 Thread Michael Ellerman
Hi Jakub,

[Cc += Timekeeping maintainers]

"Jakub Drnec"  writes:
> Hi all,
>
> I think I observed a potential problem, is this the correct place to report 
> it? (CC me, not on list)
>
> [1.] One line summary: monotonic clock can be made to decrease on ppc64
> [2.] Full description:
> Setting the realtime clock can sometimes make the monotonic clock go back by 
> over a hundred years.
> Decreasing the realtime clock across the y2k38 threshold is one reliable way 
> to reproduce.
> Allegedly this can also happen just by running ntpd, I have not managed to 
> reproduce that other
> than booting with rtc at >2038 and then running ntp.
> When this happens, anything with timers (e.g. openjdk) breaks rather badly.

Thanks for the report.

> The problem seems to be in vDSO code in 
> arch/powerpc/kernel/vdso64/gettimeofday.S.

You're right, the wall-to-monotonic offset (wtom_clock_sec) is a signed
32-bit value, so that seems like it's going to have problems.

If I do `date -s 2037-1-1` I see:

[   26.024061] update_vsyscall: tk->wall_to_monotonic.tv_sec -2114341175
[   26.042633] update_vsyscall: vdso_data->wtom_clock_sec-2114341175

Which looks sane.

But then 2040-1-1 shows:

[   32.617020] update_vsyscall: tk->wall_to_monotonic.tv_sec -2208949168
[   32.632642] update_vsyscall: vdso_data->wtom_clock_sec 2086018128

ie. the larger negative offset has overflowed and become positive.

But then when we go back to 2037 we get a negative offset again and
monotonic time appears to go backward and things are unhappy.

I don't know this code well, but the patch below *appears* to work. I'll
have a closer look on Monday.

cheers


diff --git a/arch/powerpc/include/asm/vdso_datapage.h 
b/arch/powerpc/include/asm/vdso_datapage.h
index 1afe90ade595..139133ec21d5 100644
--- a/arch/powerpc/include/asm/vdso_datapage.h
+++ b/arch/powerpc/include/asm/vdso_datapage.h
@@ -82,7 +82,7 @@ struct vdso_data {
__u32 icache_block_size;/* L1 i-cache block size */
__u32 dcache_log_block_size;/* L1 d-cache log block size */
__u32 icache_log_block_size;/* L1 i-cache log block size */
-   __s32 wtom_clock_sec;   /* Wall to monotonic clock */
+   __s64 wtom_clock_sec;   /* Wall to monotonic clock */
__s32 wtom_clock_nsec;
struct timespec stamp_xtime;/* xtime as at tb_orig_stamp */
__u32 stamp_sec_fraction;   /* fractional seconds of stamp_xtime */
diff --git a/arch/powerpc/kernel/vdso64/gettimeofday.S 
b/arch/powerpc/kernel/vdso64/gettimeofday.S
index a4ed9edfd5f0..1f324c28705b 100644
--- a/arch/powerpc/kernel/vdso64/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso64/gettimeofday.S
@@ -92,7 +92,7 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
 * At this point, r4,r5 contain our sec/nsec values.
 */
 
-   lwa r6,WTOM_CLOCK_SEC(r3)
+   ld  r6,WTOM_CLOCK_SEC(r3)
lwa r9,WTOM_CLOCK_NSEC(r3)
 
/* We now have our result in r6,r9. We create a fake dependency
@@ -125,7 +125,7 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
bne cr6,75f
 
/* CLOCK_MONOTONIC_COARSE */
-   lwa r6,WTOM_CLOCK_SEC(r3)
+   ld  r6,WTOM_CLOCK_SEC(r3)
lwa r9,WTOM_CLOCK_NSEC(r3)
 
/* check if counter has updated */


> [3.] Keywords: gettimeofday, ppc64, vdso
> [4.] Kernel information
> [4.1.] Kernel version: any (tested on 4.19)
> [4.2.] Kernel .config file: any
> [5.] Most recent kernel version which did not have the bug: not a regression
> [6.] Output of Oops..: not applicable
> [7.] Example program which triggers the problem
> --- testcase.c
> #include 
> #include 
> #include 
> #include 
>
> long get_time() {
>   struct timespec tp;
>   if (clock_gettime(CLOCK_MONOTONIC, ) != 0) {
> perror("clock_gettime failed");
> exit(1);
>   }
>   long result = tp.tv_sec + tp.tv_nsec / 10;
>   return result;
> }
>
> int main() {
>   printf("monitoring monotonic clock...\n");
>   long last = get_time();
>   while(1) {
> long now = get_time();
> if (now < last) {
>   printf("clock went backwards by %ld seconds!\n",
> last - now);
> }
> last = now;
> sleep(1);
>   }
>   return 0;
> }
> ---
> when running
> # date -s 2040-1-1
> # date -s 2037-1-1
> program outputs: clock went backwards by 4294967295 seconds!
>
> [8.] Environment: any ppc64, currently reproducing on qemu-system-ppc64le 
> running debian unstable
> [X.] Other notes, patches, fixes, workarounds:
> The problem seems to be in vDSO code in 
> arch/powerpc/kernel/vdso64/gettimeofday.S.
> (possibly because some values used in the calculation are only 32 bit?)
> Slightly silly workaround: 
> nuke the "cmpwi cr1,r3,CLOCK_MONOTONIC" in __kernel_clock_gettime
> Now it always goes through the syscall fallback which does not have the same 
> problem.
>
> Regards,
> Jakub Drnec


Re: PROBLEM: monotonic clock going backwards on ppc64

2019-02-28 Thread Daniel Axtens
Mathieu Malaterre  writes:

> On Tue, Feb 26, 2019 at 9:36 PM Jakub Drnec  wrote:
>>
>> Hi all,
>>
>> I think I observed a potential problem, is this the correct place to report 
>> it? (CC me, not on list)
>>
>> [1.] One line summary: monotonic clock can be made to decrease on ppc64
>> [2.] Full description:
>> Setting the realtime clock can sometimes make the monotonic clock go back by 
>> over a hundred years.
>> Decreasing the realtime clock across the y2k38 threshold is one reliable way 
>> to reproduce.
>> Allegedly this can also happen just by running ntpd, I have not managed to 
>> reproduce that other
>> than booting with rtc at >2038 and then running ntp.
>
> Isn't it the expected behavior. Here is what I see for powermac:
>
> $ git show 22db552b50fa
> ...
> This changes the logic to cast to an unsigned 32-bit number first for
> the Macintosh time and then convert that to the Unix time, which then
> gives us a time in the documented 1904..2040 year range. I decided not
> to use the longer 1970..2106 range that other drivers use, for
> consistency with the literal interpretation of the register, but that
> could be easily changed if we decide we want to support any Mac after
> 2040.
> ...
>

My interpretation of that commit is that it relates to the kernel
reading the hardware RTC on a powermac, but this issue relates to
userspace fetching the time from the vDSO. I'm also not running on a
powermac, so my hardware should be able to deal with times > 2040...


Regards,
Daniel

>> When this happens, anything with timers (e.g. openjdk) breaks rather badly.
>>
>> [3.] Keywords: gettimeofday, ppc64, vdso
>> [4.] Kernel information
>> [4.1.] Kernel version: any (tested on 4.19)
>> [4.2.] Kernel .config file: any
>> [5.] Most recent kernel version which did not have the bug: not a regression
>> [6.] Output of Oops..: not applicable
>> [7.] Example program which triggers the problem
>> --- testcase.c
>> #include 
>> #include 
>> #include 
>> #include 
>>
>> long get_time() {
>>   struct timespec tp;
>>   if (clock_gettime(CLOCK_MONOTONIC, ) != 0) {
>> perror("clock_gettime failed");
>> exit(1);
>>   }
>>   long result = tp.tv_sec + tp.tv_nsec / 10;
>>   return result;
>> }
>>
>> int main() {
>>   printf("monitoring monotonic clock...\n");
>>   long last = get_time();
>>   while(1) {
>> long now = get_time();
>> if (now < last) {
>>   printf("clock went backwards by %ld seconds!\n",
>> last - now);
>> }
>> last = now;
>> sleep(1);
>>   }
>>   return 0;
>> }
>> ---
>> when running
>> # date -s 2040-1-1
>> # date -s 2037-1-1
>> program outputs: clock went backwards by 4294967295 seconds!
>>
>> [8.] Environment: any ppc64, currently reproducing on qemu-system-ppc64le 
>> running debian unstable
>> [X.] Other notes, patches, fixes, workarounds:
>> The problem seems to be in vDSO code in 
>> arch/powerpc/kernel/vdso64/gettimeofday.S.
>> (possibly because some values used in the calculation are only 32 bit?)
>> Slightly silly workaround:
>> nuke the "cmpwi cr1,r3,CLOCK_MONOTONIC" in __kernel_clock_gettime
>> Now it always goes through the syscall fallback which does not have the same 
>> problem.
>>
>> Regards,
>> Jakub Drnec


Re: PROBLEM: monotonic clock going backwards on ppc64

2019-02-27 Thread Mathieu Malaterre
On Tue, Feb 26, 2019 at 9:36 PM Jakub Drnec  wrote:
>
> Hi all,
>
> I think I observed a potential problem, is this the correct place to report 
> it? (CC me, not on list)
>
> [1.] One line summary: monotonic clock can be made to decrease on ppc64
> [2.] Full description:
> Setting the realtime clock can sometimes make the monotonic clock go back by 
> over a hundred years.
> Decreasing the realtime clock across the y2k38 threshold is one reliable way 
> to reproduce.
> Allegedly this can also happen just by running ntpd, I have not managed to 
> reproduce that other
> than booting with rtc at >2038 and then running ntp.

Isn't it the expected behavior. Here is what I see for powermac:

$ git show 22db552b50fa
...
This changes the logic to cast to an unsigned 32-bit number first for
the Macintosh time and then convert that to the Unix time, which then
gives us a time in the documented 1904..2040 year range. I decided not
to use the longer 1970..2106 range that other drivers use, for
consistency with the literal interpretation of the register, but that
could be easily changed if we decide we want to support any Mac after
2040.
...

> When this happens, anything with timers (e.g. openjdk) breaks rather badly.
>
> [3.] Keywords: gettimeofday, ppc64, vdso
> [4.] Kernel information
> [4.1.] Kernel version: any (tested on 4.19)
> [4.2.] Kernel .config file: any
> [5.] Most recent kernel version which did not have the bug: not a regression
> [6.] Output of Oops..: not applicable
> [7.] Example program which triggers the problem
> --- testcase.c
> #include 
> #include 
> #include 
> #include 
>
> long get_time() {
>   struct timespec tp;
>   if (clock_gettime(CLOCK_MONOTONIC, ) != 0) {
> perror("clock_gettime failed");
> exit(1);
>   }
>   long result = tp.tv_sec + tp.tv_nsec / 10;
>   return result;
> }
>
> int main() {
>   printf("monitoring monotonic clock...\n");
>   long last = get_time();
>   while(1) {
> long now = get_time();
> if (now < last) {
>   printf("clock went backwards by %ld seconds!\n",
> last - now);
> }
> last = now;
> sleep(1);
>   }
>   return 0;
> }
> ---
> when running
> # date -s 2040-1-1
> # date -s 2037-1-1
> program outputs: clock went backwards by 4294967295 seconds!
>
> [8.] Environment: any ppc64, currently reproducing on qemu-system-ppc64le 
> running debian unstable
> [X.] Other notes, patches, fixes, workarounds:
> The problem seems to be in vDSO code in 
> arch/powerpc/kernel/vdso64/gettimeofday.S.
> (possibly because some values used in the calculation are only 32 bit?)
> Slightly silly workaround:
> nuke the "cmpwi cr1,r3,CLOCK_MONOTONIC" in __kernel_clock_gettime
> Now it always goes through the syscall fallback which does not have the same 
> problem.
>
> Regards,
> Jakub Drnec


Re: PROBLEM: monotonic clock going backwards on ppc64

2019-02-27 Thread Daniel Axtens
Hi,

> I think I observed a potential problem, is this the correct place to report 
> it? (CC me, not on list)

Yes

> [1.] One line summary: monotonic clock can be made to decrease on ppc64
> [2.] Full description:
> Setting the realtime clock can sometimes make the monotonic clock go back by 
> over a hundred years.
> Decreasing the realtime clock across the y2k38 threshold is one reliable way 
> to reproduce.
> Allegedly this can also happen just by running ntpd, I have not managed to 
> reproduce that other
> than booting with rtc at >2038 and then running ntp.
> When this happens, anything with timers (e.g. openjdk) breaks rather badly.

I can confirm this on a physical 64bit book3e system and powerpc/next
with the provided reproducer.

> [X.] Other notes, patches, fixes, workarounds:
> The problem seems to be in vDSO code in 
> arch/powerpc/kernel/vdso64/gettimeofday.S.
> (possibly because some values used in the calculation are only 32 bit?)
> Slightly silly workaround: 
> nuke the "cmpwi cr1,r3,CLOCK_MONOTONIC" in __kernel_clock_gettime
> Now it always goes through the syscall fallback which does not have the same 
> problem.

I'll look at the asm but I'm no expert here so we might need to defer to
someone with real expertise. Falling back to the syscall sadly ruins
the performance gain of having a vDSO.

Regards,
Daniel



PROBLEM: monotonic clock going backwards on ppc64

2019-02-26 Thread Jakub Drnec
Hi all,

I think I observed a potential problem, is this the correct place to report it? 
(CC me, not on list)

[1.] One line summary: monotonic clock can be made to decrease on ppc64
[2.] Full description:
Setting the realtime clock can sometimes make the monotonic clock go back by 
over a hundred years.
Decreasing the realtime clock across the y2k38 threshold is one reliable way to 
reproduce.
Allegedly this can also happen just by running ntpd, I have not managed to 
reproduce that other
than booting with rtc at >2038 and then running ntp.
When this happens, anything with timers (e.g. openjdk) breaks rather badly.

[3.] Keywords: gettimeofday, ppc64, vdso
[4.] Kernel information
[4.1.] Kernel version: any (tested on 4.19)
[4.2.] Kernel .config file: any
[5.] Most recent kernel version which did not have the bug: not a regression
[6.] Output of Oops..: not applicable
[7.] Example program which triggers the problem
--- testcase.c
#include 
#include 
#include 
#include 

long get_time() {
  struct timespec tp;
  if (clock_gettime(CLOCK_MONOTONIC, ) != 0) {
perror("clock_gettime failed");
exit(1);
  }
  long result = tp.tv_sec + tp.tv_nsec / 10;
  return result;
}

int main() {
  printf("monitoring monotonic clock...\n");
  long last = get_time();
  while(1) {
long now = get_time();
if (now < last) {
  printf("clock went backwards by %ld seconds!\n",
last - now);
}
last = now;
sleep(1);
  }
  return 0;
}
---
when running
# date -s 2040-1-1
# date -s 2037-1-1
program outputs: clock went backwards by 4294967295 seconds!

[8.] Environment: any ppc64, currently reproducing on qemu-system-ppc64le 
running debian unstable
[X.] Other notes, patches, fixes, workarounds:
The problem seems to be in vDSO code in 
arch/powerpc/kernel/vdso64/gettimeofday.S.
(possibly because some values used in the calculation are only 32 bit?)
Slightly silly workaround:
nuke the "cmpwi cr1,r3,CLOCK_MONOTONIC" in __kernel_clock_gettime
Now it always goes through the syscall fallback which does not have the same 
problem.

Regards,
Jakub Drnec