Re: hrtimer: one more expiry time overflow check in hrtimer_interrupt

2013-07-05 Thread Thomas Gleixner
On Fri, 5 Jul 2013, Shinya Kuribayashi wrote:
> On 6/28/2013 9:22 PM, Thomas Gleixner wrote:
> >> On the other hand, we have another call site of tick_program_event() at
> >> the bottom of hrtimer_interrupt().  The warning this time is triggered
> >> there, so we need to apply the same fix to it.
> > 
> > Well, the problem is that you are just papering over the underlying
> > issue of 32bit systems not being prepared for the year 2038 issue.
> > 
> > Just blindly silencing the warning is not going to make the system
> > survive 2038 in any sane way. All timespec/val related time functions
> > dealing with the clock realtime domain are simply broken in 2038 on
> > 32bit, so it does not matter whether a warning triggers or not.
> 
> You're right.  With this patch applied, the hrtimer_interrupt /looks/
> back to normal, but /proc/timer_list still show that "expires at [in
> negative range]":
> 
> active timers:
>  #0: tick_cpu_sched, tick_sched_timer, S:01
>  # expires at 5081250-5081250 nsecs [in -165398341280 to 
> -165398341280 nsecs]
> 
> This shouldn't happen and something weird is still going on.

I told you :)
 
> > We really need to tackle the underlying problem and not bandaiding a
> > known to be broken system.
> 
> Agreed, but a little bit hard task for me.  This is 100% reproducible,
> so I can help debug / verification.  Please let me know, if necessary.

That will take some time. It's quite a huge change all over the
place. We are aware of that fact and we have ideas how to fix it, but
that won't happen in the near future as we need to discuss the user
space visible changes as well. But we have started to discuss possible
solutions, so this will be solved way before 1.1.2038 :)

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: hrtimer: one more expiry time overflow check in hrtimer_interrupt

2013-07-05 Thread Shinya Kuribayashi
On 6/28/2013 9:22 PM, Thomas Gleixner wrote:
>> On the other hand, we have another call site of tick_program_event() at
>> the bottom of hrtimer_interrupt().  The warning this time is triggered
>> there, so we need to apply the same fix to it.
> 
> Well, the problem is that you are just papering over the underlying
> issue of 32bit systems not being prepared for the year 2038 issue.
> 
> Just blindly silencing the warning is not going to make the system
> survive 2038 in any sane way. All timespec/val related time functions
> dealing with the clock realtime domain are simply broken in 2038 on
> 32bit, so it does not matter whether a warning triggers or not.

You're right.  With this patch applied, the hrtimer_interrupt /looks/
back to normal, but /proc/timer_list still show that "expires at [in
negative range]":

active timers:
 #0: tick_cpu_sched, tick_sched_timer, S:01
 # expires at 5081250-5081250 nsecs [in -165398341280 to 
-165398341280 nsecs]

This shouldn't happen and something weird is still going on.

> We really need to tackle the underlying problem and not bandaiding a
> known to be broken system.

Agreed, but a little bit hard task for me.  This is 100% reproducible,
so I can help debug / verification.  Please let me know, if necessary.

Thanks for your comments,
--
Shinya Kuribayashi
Renesas Electronics
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: hrtimer: one more expiry time overflow check in hrtimer_interrupt

2013-07-05 Thread Shinya Kuribayashi
On 6/28/2013 9:22 PM, Thomas Gleixner wrote:
 On the other hand, we have another call site of tick_program_event() at
 the bottom of hrtimer_interrupt().  The warning this time is triggered
 there, so we need to apply the same fix to it.
 
 Well, the problem is that you are just papering over the underlying
 issue of 32bit systems not being prepared for the year 2038 issue.
 
 Just blindly silencing the warning is not going to make the system
 survive 2038 in any sane way. All timespec/val related time functions
 dealing with the clock realtime domain are simply broken in 2038 on
 32bit, so it does not matter whether a warning triggers or not.

You're right.  With this patch applied, the hrtimer_interrupt /looks/
back to normal, but /proc/timer_list still show that expires at [in
negative range]:

active timers:
 #0: tick_cpu_sched, tick_sched_timer, S:01
 # expires at 5081250-5081250 nsecs [in -165398341280 to 
-165398341280 nsecs]

This shouldn't happen and something weird is still going on.

 We really need to tackle the underlying problem and not bandaiding a
 known to be broken system.

Agreed, but a little bit hard task for me.  This is 100% reproducible,
so I can help debug / verification.  Please let me know, if necessary.

Thanks for your comments,
--
Shinya Kuribayashi
Renesas Electronics
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: hrtimer: one more expiry time overflow check in hrtimer_interrupt

2013-07-05 Thread Thomas Gleixner
On Fri, 5 Jul 2013, Shinya Kuribayashi wrote:
 On 6/28/2013 9:22 PM, Thomas Gleixner wrote:
  On the other hand, we have another call site of tick_program_event() at
  the bottom of hrtimer_interrupt().  The warning this time is triggered
  there, so we need to apply the same fix to it.
  
  Well, the problem is that you are just papering over the underlying
  issue of 32bit systems not being prepared for the year 2038 issue.
  
  Just blindly silencing the warning is not going to make the system
  survive 2038 in any sane way. All timespec/val related time functions
  dealing with the clock realtime domain are simply broken in 2038 on
  32bit, so it does not matter whether a warning triggers or not.
 
 You're right.  With this patch applied, the hrtimer_interrupt /looks/
 back to normal, but /proc/timer_list still show that expires at [in
 negative range]:
 
 active timers:
  #0: tick_cpu_sched, tick_sched_timer, S:01
  # expires at 5081250-5081250 nsecs [in -165398341280 to 
 -165398341280 nsecs]
 
 This shouldn't happen and something weird is still going on.

I told you :)
 
  We really need to tackle the underlying problem and not bandaiding a
  known to be broken system.
 
 Agreed, but a little bit hard task for me.  This is 100% reproducible,
 so I can help debug / verification.  Please let me know, if necessary.

That will take some time. It's quite a huge change all over the
place. We are aware of that fact and we have ideas how to fix it, but
that won't happen in the near future as we need to discuss the user
space visible changes as well. But we have started to discuss possible
solutions, so this will be solved way before 1.1.2038 :)

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: hrtimer: one more expiry time overflow check in hrtimer_interrupt

2013-06-28 Thread Thomas Gleixner
On Tue, 11 Jun 2013, Shinya Kuribayashi wrote:

> When executing a date command to set the system date and time to a few
> seconds before the 2038 problem expiration time, we got a WARN_ON_ONCE()
> like this:
> 
>   root@renesas:~# date -s "2038-1-19 3:14:00"
>   Tue Jan 19 03:14:00 GMT 2038
>   (then wait for 7-8 seconds)
>   root@renesas:~# [   27.662658] [ cut here ]
>   [   27.667297] WARNING: at kernel/time/clockevents.c:209 
> clockevents_program_event+0x3c/0x138()

> I found a similar issue fixed in v3.9 by Prarit Bhargava in commit
> 8f294b5a13 (hrtimer: Add expiry time overflow check in hrtimer_interrupt,
> 2013-04-08).  It tried to resolve a overflow issue detected around 1970
> + 100 seconds.
> 
> On the other hand, we have another call site of tick_program_event() at
> the bottom of hrtimer_interrupt().  The warning this time is triggered
> there, so we need to apply the same fix to it.

Well, the problem is that you are just papering over the underlying
issue of 32bit systems not being prepared for the year 2038 issue.

Just blindly silencing the warning is not going to make the system
survive 2038 in any sane way. All timespec/val related time functions
dealing with the clock realtime domain are simply broken in 2038 on
32bit, so it does not matter whether a warning triggers or not.
 
We really need to tackle the underlying problem and not bandaiding a
known to be broken system.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: hrtimer: one more expiry time overflow check in hrtimer_interrupt

2013-06-28 Thread Thomas Gleixner
On Tue, 11 Jun 2013, Shinya Kuribayashi wrote:

 When executing a date command to set the system date and time to a few
 seconds before the 2038 problem expiration time, we got a WARN_ON_ONCE()
 like this:
 
   root@renesas:~# date -s 2038-1-19 3:14:00
   Tue Jan 19 03:14:00 GMT 2038
   (then wait for 7-8 seconds)
   root@renesas:~# [   27.662658] [ cut here ]
   [   27.667297] WARNING: at kernel/time/clockevents.c:209 
 clockevents_program_event+0x3c/0x138()

 I found a similar issue fixed in v3.9 by Prarit Bhargava in commit
 8f294b5a13 (hrtimer: Add expiry time overflow check in hrtimer_interrupt,
 2013-04-08).  It tried to resolve a overflow issue detected around 1970
 + 100 seconds.
 
 On the other hand, we have another call site of tick_program_event() at
 the bottom of hrtimer_interrupt().  The warning this time is triggered
 there, so we need to apply the same fix to it.

Well, the problem is that you are just papering over the underlying
issue of 32bit systems not being prepared for the year 2038 issue.

Just blindly silencing the warning is not going to make the system
survive 2038 in any sane way. All timespec/val related time functions
dealing with the clock realtime domain are simply broken in 2038 on
32bit, so it does not matter whether a warning triggers or not.
 
We really need to tackle the underlying problem and not bandaiding a
known to be broken system.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: hrtimer: one more expiry time overflow check in hrtimer_interrupt

2013-06-14 Thread Shinya Kuribayashi
Hello,

On 6/12/2013 9:21 PM, Prarit Bhargava wrote:
> On 06/11/2013 03:41 AM, Shinya Kuribayashi wrote:
>>   [   27.857330] hrtimer: interrupt took 0 ns
> 
> ^^^ see below ...

This may be because the frequency of our tick timer source in this
system is very slow, it's 32768 Hz.  I think it's not suitable for
the high resolution timer at all, but we had no choice...

And with this patch applied,

> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index cdd5607..bc37552 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -1371,6 +1371,8 @@ retry:
>   tick_program_event(expires_next, 1);
>   printk_once(KERN_WARNING "hrtimer: interrupt took %llu ns\n",
>   ktime_to_ns(delta));
> + printk_once(KERN_WARNING "# now=%lld entry_time=%lld\n", 
> ktime_to_ns(now), ktime_to_ns(entry_time));
> + printk_once(KERN_WARNING "# expires_next=%lld\n", 
> ktime_to_ns(expires_next));
>  }
>  
>  /*

I've got this:

[   22.857849] hrtimer: interrupt took 0 ns
[   22.861755] # now=-4294967273343634023 entry_time=-4294967273343634023
[   22.868286] # expires_next=-4294967273343634023

now == entry_time == expires_next, and delta == 0.

This means:
1) there haven't been no tick interrupts between 'entry_time' and 'now',
2) delta can be zero,
3) expires_next = ktime_add(now, delta) doesn't move expires_next forward.

Ok, it's simply hrtimer is not supposed to work with such a slow timer
source.  However, this issue happens on other shmobile ARM systems with
more faster tick timer source (10--13MHz), see below.

>> @@ -1368,6 +1370,8 @@ retry:
>>  expires_next = ktime_add_ns(now, 100 * NSEC_PER_MSEC);
>>  else
>>  expires_next = ktime_add(now, delta);
>> +if (expires_next.tv64 < 0)
>> +expires_next.tv64 = KTIME_MAX;
> 
> Even with this change you will still see the warning below if delta = 0.

Correct.

>>  tick_program_event(expires_next, 1);
>>  printk_once(KERN_WARNING "hrtimer: interrupt took %llu ns\n",
>>  ktime_to_ns(delta));
> 
> So I'm not sure that this is the correct thing to do.
> 
> Is this reproducible on any ARM system?  I'll grab an x86_64 box and give it a
> shot there too.  Can you dump the values of now, delta, and expires_next when
> the printk_once triggers?

It's 100% reproducible in our shmobile ARM kernels, v3.4 and v3.10-rc2.
And it doesn't reproduce with v2.6.35-based kernels so far.

Here're logs from Cortex-A15 UP/SMP systems.  They're having ARM
architected tiemrs (the Generic Timers, running at 10--13MHz) using as
clocksource, clockevents, sched_clock, and read_current_timer (udelay)
sources.

-
machine: R-Mobile APE6 (Cortex-A15 UP system - it's octo-core A15/A7 SoC, but 
not yet available in mainline)
kernel : v3.10-rc2 mainline
timer  : the Generic Timers (arch/arm/kernel/arch_timer.c) running at 13MHz
-
|# uname -a
|Linux arm 3.10.0-rc2-00966-gd658f9e #1284 Tue Jun 11 13:36:10 JST 2013 armv7l 
GNU/Linux
|# cat /proc/timer_list |egrep "Clock Event Device|event_handler"
|Clock Event Device: arch_sys_timer
| event_handler:  hrtimer_interrupt
|# date -s "2038-1-19 3:14:00"
|Tue Jan 19 03:14:00 UTC 2038
|# [  138.942325] [ cut here ]
|[  138.946910] WARNING: at kernel/time/clockevents.c:209 
clockevents_program_event+0x130/0x148()
|[  138.955315] Modules linked in:
|[  138.958339] CPU: 0 PID: 0 Comm: swapper Tainted: GW
3.10.0-rc2-00966-gd658f9e #1284
|[...]
|[  139.131970] ---[ end trace f879bec39a6bf85c ]---
|[  139.136521] hrtimer: interrupt took 2385 ns

-
machine: R-Car H2 (quad-core Cortex-A15 SMP system)
kernel : v3.4-based custom Android kernel
timer  : the Generic Timers (arch/arm/kernel/arch_timer.c) running at 10MHz
-
|root@renesas:~# date -s "2038-1-19 3:14:00"
|Tue Jan 19 03:14:00 GMT 2038
|root@renesas:~# [   26.458377] [ cut here ]
|[   26.472214] WARNING: at kernel/time/clockevents.c:209 
clockevents_program_event+0x3c/0x138()
|[...]
|[   26.980342] ---[ end trace 5218f4fd6603f493 ]---
|[   26.994166] hrtimer: interrupt took 1900 ns
|[   27.006687] # now=-4294967269549995552 entry_time=-4294967269549997452
|[   27.026240] # expires_next=-4294967269549993652

I also gave a try on my Ubuntu box, but it seems to work without
printk_once() at the bottome of hrtimer_interrupt() printed, which means
that hrtimer_update_base()-and-retries mechanism works as expected.

$ uname -a
Linux RI106878 3.2.0-45-generic #70-Ubuntu SMP Wed May 29 20:12:06 UTC 2013 
x86_64 x86_64 x86_64 GNU/Linux

I'd like to see whether this happens on other ARM machines excpet for
shmobile.  Could someone help, please?
--
Shinya Kuribayashi
Renesas 

Re: hrtimer: one more expiry time overflow check in hrtimer_interrupt

2013-06-14 Thread Shinya Kuribayashi
Hello,

On 6/12/2013 9:21 PM, Prarit Bhargava wrote:
 On 06/11/2013 03:41 AM, Shinya Kuribayashi wrote:
   [   27.857330] hrtimer: interrupt took 0 ns
 
 ^^^ see below ...

This may be because the frequency of our tick timer source in this
system is very slow, it's 32768 Hz.  I think it's not suitable for
the high resolution timer at all, but we had no choice...

And with this patch applied,

 diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
 index cdd5607..bc37552 100644
 --- a/kernel/hrtimer.c
 +++ b/kernel/hrtimer.c
 @@ -1371,6 +1371,8 @@ retry:
   tick_program_event(expires_next, 1);
   printk_once(KERN_WARNING hrtimer: interrupt took %llu ns\n,
   ktime_to_ns(delta));
 + printk_once(KERN_WARNING # now=%lld entry_time=%lld\n, 
 ktime_to_ns(now), ktime_to_ns(entry_time));
 + printk_once(KERN_WARNING # expires_next=%lld\n, 
 ktime_to_ns(expires_next));
  }
  
  /*

I've got this:

[   22.857849] hrtimer: interrupt took 0 ns
[   22.861755] # now=-4294967273343634023 entry_time=-4294967273343634023
[   22.868286] # expires_next=-4294967273343634023

now == entry_time == expires_next, and delta == 0.

This means:
1) there haven't been no tick interrupts between 'entry_time' and 'now',
2) delta can be zero,
3) expires_next = ktime_add(now, delta) doesn't move expires_next forward.

Ok, it's simply hrtimer is not supposed to work with such a slow timer
source.  However, this issue happens on other shmobile ARM systems with
more faster tick timer source (10--13MHz), see below.

 @@ -1368,6 +1370,8 @@ retry:
  expires_next = ktime_add_ns(now, 100 * NSEC_PER_MSEC);
  else
  expires_next = ktime_add(now, delta);
 +if (expires_next.tv64  0)
 +expires_next.tv64 = KTIME_MAX;
 
 Even with this change you will still see the warning below if delta = 0.

Correct.

  tick_program_event(expires_next, 1);
  printk_once(KERN_WARNING hrtimer: interrupt took %llu ns\n,
  ktime_to_ns(delta));
 
 So I'm not sure that this is the correct thing to do.
 
 Is this reproducible on any ARM system?  I'll grab an x86_64 box and give it a
 shot there too.  Can you dump the values of now, delta, and expires_next when
 the printk_once triggers?

It's 100% reproducible in our shmobile ARM kernels, v3.4 and v3.10-rc2.
And it doesn't reproduce with v2.6.35-based kernels so far.

Here're logs from Cortex-A15 UP/SMP systems.  They're having ARM
architected tiemrs (the Generic Timers, running at 10--13MHz) using as
clocksource, clockevents, sched_clock, and read_current_timer (udelay)
sources.

-
machine: R-Mobile APE6 (Cortex-A15 UP system - it's octo-core A15/A7 SoC, but 
not yet available in mainline)
kernel : v3.10-rc2 mainline
timer  : the Generic Timers (arch/arm/kernel/arch_timer.c) running at 13MHz
-
|# uname -a
|Linux arm 3.10.0-rc2-00966-gd658f9e #1284 Tue Jun 11 13:36:10 JST 2013 armv7l 
GNU/Linux
|# cat /proc/timer_list |egrep Clock Event Device|event_handler
|Clock Event Device: arch_sys_timer
| event_handler:  hrtimer_interrupt
|# date -s 2038-1-19 3:14:00
|Tue Jan 19 03:14:00 UTC 2038
|# [  138.942325] [ cut here ]
|[  138.946910] WARNING: at kernel/time/clockevents.c:209 
clockevents_program_event+0x130/0x148()
|[  138.955315] Modules linked in:
|[  138.958339] CPU: 0 PID: 0 Comm: swapper Tainted: GW
3.10.0-rc2-00966-gd658f9e #1284
|[...]
|[  139.131970] ---[ end trace f879bec39a6bf85c ]---
|[  139.136521] hrtimer: interrupt took 2385 ns

-
machine: R-Car H2 (quad-core Cortex-A15 SMP system)
kernel : v3.4-based custom Android kernel
timer  : the Generic Timers (arch/arm/kernel/arch_timer.c) running at 10MHz
-
|root@renesas:~# date -s 2038-1-19 3:14:00
|Tue Jan 19 03:14:00 GMT 2038
|root@renesas:~# [   26.458377] [ cut here ]
|[   26.472214] WARNING: at kernel/time/clockevents.c:209 
clockevents_program_event+0x3c/0x138()
|[...]
|[   26.980342] ---[ end trace 5218f4fd6603f493 ]---
|[   26.994166] hrtimer: interrupt took 1900 ns
|[   27.006687] # now=-4294967269549995552 entry_time=-4294967269549997452
|[   27.026240] # expires_next=-4294967269549993652

I also gave a try on my Ubuntu box, but it seems to work without
printk_once() at the bottome of hrtimer_interrupt() printed, which means
that hrtimer_update_base()-and-retries mechanism works as expected.

$ uname -a
Linux RI106878 3.2.0-45-generic #70-Ubuntu SMP Wed May 29 20:12:06 UTC 2013 
x86_64 x86_64 x86_64 GNU/Linux

I'd like to see whether this happens on other ARM machines excpet for
shmobile.  Could someone help, please?
--
Shinya Kuribayashi
Renesas Electronics
--
To unsubscribe from this list: send the line 

Re: hrtimer: one more expiry time overflow check in hrtimer_interrupt

2013-06-12 Thread Prarit Bhargava


On 06/11/2013 03:41 AM, Shinya Kuribayashi wrote:
> When executing a date command to set the system date and time to a few
> seconds before the 2038 problem expiration time, we got a WARN_ON_ONCE()
> like this:
> 
>   root@renesas:~# date -s "2038-1-19 3:14:00"
>   Tue Jan 19 03:14:00 GMT 2038
>   (then wait for 7-8 seconds)
>   root@renesas:~# [   27.662658] [ cut here ]
>   [   27.667297] WARNING: at kernel/time/clockevents.c:209 
> clockevents_program_event+0x3c/0x138()
>   [   27.675720] Modules linked in:
>   [   27.678802] [] (unwind_backtrace+0x0/0xe0) from [] 
> (warn_slowpath_common+0x4c/0x64)
>   [   27.688201] [] (warn_slowpath_common+0x4c/0x64) from 
> [] (warn_slowpath_null+0x18/0x1c)
>   [   27.697845] [] (warn_slowpath_null+0x18/0x1c) from 
> [] (clockevents_program_event+0x3c/0x138)
>   [   27.708007] [] (clockevents_program_event+0x3c/0x138) from 
> [] (tick_program_event+0x2c/0x34)
>   [   27.718170] [] (tick_program_event+0x2c/0x34) from 
> [] (hrtimer_interrupt+0x268/0x2a8)
>   [   27.727752] [] (hrtimer_interrupt+0x268/0x2a8) from 
> [] (cmt_timer_interrupt+0x2c/0x34)
>   [   27.737396] [] (cmt_timer_interrupt+0x2c/0x34) from 
> [] (handle_irq_event_percpu+0xb0/0x2a8)
>   [   27.747467] [] (handle_irq_event_percpu+0xb0/0x2a8) from 
> [] (handle_irq_event+0x58/0x74)
>   [   27.757293] [] (handle_irq_event+0x58/0x74) from [] 
> (handle_fasteoi_irq+0xc0/0x148)
>   [   27.72] [] (handle_fasteoi_irq+0xc0/0x148) from 
> [] (generic_handle_irq+0x20/0x30)
>   [   27.776245] [] (generic_handle_irq+0x20/0x30) from 
> [] (handle_IRQ+0x60/0x84)
>   [   27.785003] [] (handle_IRQ+0x60/0x84) from [] 
> (gic_handle_irq+0x34/0x4c)
>   [   27.793426] [] (gic_handle_irq+0x34/0x4c) from [] 
> (__irq_svc+0x40/0x70)
>   [   27.801788] Exception stack(0xc04aff68 to 0xc04affb0)
>   [   27.806823] ff60:    f010 0001  
> c04ae000 c04ec388
>   [   27.815002] ff80: c04b604c c0840d80 40004059 412fc093   
> c04ce140 c04affb0
>   [   27.823150] ffa0: c000f064 c000f068 6013 
>   [   27.828216] [] (__irq_svc+0x40/0x70) from [] 
> (default_idle+0x24/0x2c)
>   [   27.836395] [] (default_idle+0x24/0x2c) from [] 
> (cpu_idle+0x74/0xc8)
>   [   27.844451] [] (cpu_idle+0x74/0xc8) from [] 
> (start_kernel+0x248/0x288)
>   [   27.852722] ---[ end trace 9d8ad385bde80fd3 ]---


>   [   27.857330] hrtimer: interrupt took 0 ns

^^^ see below ...

> 
> This is triggered with our v3.4-based custom ARM kernel, but we
> confirmed that v3.10-rc can still have the same problem.
> 
> I found a similar issue fixed in v3.9 by Prarit Bhargava in commit
> 8f294b5a13 (hrtimer: Add expiry time overflow check in hrtimer_interrupt,
> 2013-04-08).  It tried to resolve a overflow issue detected around 1970
> + 100 seconds.
> 
> On the other hand, we have another call site of tick_program_event() at
> the bottom of hrtimer_interrupt().  The warning this time is triggered
> there, so we need to apply the same fix to it.
> 
> Reported-by: Hiroyuki Yokoyama 
> Signed-off-by: Shinya Kuribayashi 
> ---
> 
> Hi Prarit-san and John-san,
> 
> http://git.kernel.org/linus/8f294b5a139ee4b75e890ad5b443c93d1e558a8b
> hrtimer: Add expiry time overflow check in hrtimer_interrupt
> 
> I tried to fix the other case of overflow issues in hrtimer_interrupt(),
> but not sure it should be worked around like this in the first place.
> 
> Any comments are appreciated, thanks in advance.
> 
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index cdd5607..a42d712 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -1368,6 +1370,8 @@ retry:
>   expires_next = ktime_add_ns(now, 100 * NSEC_PER_MSEC);
>   else
>   expires_next = ktime_add(now, delta);
> + if (expires_next.tv64 < 0)
> + expires_next.tv64 = KTIME_MAX;

Even with this change you will still see the warning below if delta = 0.

>   tick_program_event(expires_next, 1);
>   printk_once(KERN_WARNING "hrtimer: interrupt took %llu ns\n",
>   ktime_to_ns(delta));

So I'm not sure that this is the correct thing to do.

Is this reproducible on any ARM system?  I'll grab an x86_64 box and give it a
shot there too.  Can you dump the values of now, delta, and expires_next when
the printk_once triggers?

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: hrtimer: one more expiry time overflow check in hrtimer_interrupt

2013-06-12 Thread Prarit Bhargava


On 06/11/2013 03:41 AM, Shinya Kuribayashi wrote:
 When executing a date command to set the system date and time to a few
 seconds before the 2038 problem expiration time, we got a WARN_ON_ONCE()
 like this:
 
   root@renesas:~# date -s 2038-1-19 3:14:00
   Tue Jan 19 03:14:00 GMT 2038
   (then wait for 7-8 seconds)
   root@renesas:~# [   27.662658] [ cut here ]
   [   27.667297] WARNING: at kernel/time/clockevents.c:209 
 clockevents_program_event+0x3c/0x138()
   [   27.675720] Modules linked in:
   [   27.678802] [c00130ec] (unwind_backtrace+0x0/0xe0) from [c001f4d8] 
 (warn_slowpath_common+0x4c/0x64)
   [   27.688201] [c001f4d8] (warn_slowpath_common+0x4c/0x64) from 
 [c001f508] (warn_slowpath_null+0x18/0x1c)
   [   27.697845] [c001f508] (warn_slowpath_null+0x18/0x1c) from 
 [c00549bc] (clockevents_program_event+0x3c/0x138)
   [   27.708007] [c00549bc] (clockevents_program_event+0x3c/0x138) from 
 [c005510c] (tick_program_event+0x2c/0x34)
   [   27.718170] [c005510c] (tick_program_event+0x2c/0x34) from 
 [c003fa98] (hrtimer_interrupt+0x268/0x2a8)
   [   27.727752] [c003fa98] (hrtimer_interrupt+0x268/0x2a8) from 
 [c00180c8] (cmt_timer_interrupt+0x2c/0x34)
   [   27.737396] [c00180c8] (cmt_timer_interrupt+0x2c/0x34) from 
 [c0066748] (handle_irq_event_percpu+0xb0/0x2a8)
   [   27.747467] [c0066748] (handle_irq_event_percpu+0xb0/0x2a8) from 
 [c0066998] (handle_irq_event+0x58/0x74)
   [   27.757293] [c0066998] (handle_irq_event+0x58/0x74) from [c0068f24] 
 (handle_fasteoi_irq+0xc0/0x148)
   [   27.72] [c0068f24] (handle_fasteoi_irq+0xc0/0x148) from 
 [c0066014] (generic_handle_irq+0x20/0x30)
   [   27.776245] [c0066014] (generic_handle_irq+0x20/0x30) from 
 [c000ef54] (handle_IRQ+0x60/0x84)
   [   27.785003] [c000ef54] (handle_IRQ+0x60/0x84) from [c0009334] 
 (gic_handle_irq+0x34/0x4c)
   [   27.793426] [c0009334] (gic_handle_irq+0x34/0x4c) from [c000e2c0] 
 (__irq_svc+0x40/0x70)
   [   27.801788] Exception stack(0xc04aff68 to 0xc04affb0)
   [   27.806823] ff60:    f010 0001  
 c04ae000 c04ec388
   [   27.815002] ff80: c04b604c c0840d80 40004059 412fc093   
 c04ce140 c04affb0
   [   27.823150] ffa0: c000f064 c000f068 6013 
   [   27.828216] [c000e2c0] (__irq_svc+0x40/0x70) from [c000f068] 
 (default_idle+0x24/0x2c)
   [   27.836395] [c000f068] (default_idle+0x24/0x2c) from [c000f338] 
 (cpu_idle+0x74/0xc8)
   [   27.844451] [c000f338] (cpu_idle+0x74/0xc8) from [c048c6d4] 
 (start_kernel+0x248/0x288)
   [   27.852722] ---[ end trace 9d8ad385bde80fd3 ]---


   [   27.857330] hrtimer: interrupt took 0 ns

^^^ see below ...

 
 This is triggered with our v3.4-based custom ARM kernel, but we
 confirmed that v3.10-rc can still have the same problem.
 
 I found a similar issue fixed in v3.9 by Prarit Bhargava in commit
 8f294b5a13 (hrtimer: Add expiry time overflow check in hrtimer_interrupt,
 2013-04-08).  It tried to resolve a overflow issue detected around 1970
 + 100 seconds.
 
 On the other hand, we have another call site of tick_program_event() at
 the bottom of hrtimer_interrupt().  The warning this time is triggered
 there, so we need to apply the same fix to it.
 
 Reported-by: Hiroyuki Yokoyama hiroyuki.yokoyama...@renesas.com
 Signed-off-by: Shinya Kuribayashi shinya.kuribayashi...@renesas.com
 ---
 
 Hi Prarit-san and John-san,
 
 http://git.kernel.org/linus/8f294b5a139ee4b75e890ad5b443c93d1e558a8b
 hrtimer: Add expiry time overflow check in hrtimer_interrupt
 
 I tried to fix the other case of overflow issues in hrtimer_interrupt(),
 but not sure it should be worked around like this in the first place.
 
 Any comments are appreciated, thanks in advance.
 
 diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
 index cdd5607..a42d712 100644
 --- a/kernel/hrtimer.c
 +++ b/kernel/hrtimer.c
 @@ -1368,6 +1370,8 @@ retry:
   expires_next = ktime_add_ns(now, 100 * NSEC_PER_MSEC);
   else
   expires_next = ktime_add(now, delta);
 + if (expires_next.tv64  0)
 + expires_next.tv64 = KTIME_MAX;

Even with this change you will still see the warning below if delta = 0.

   tick_program_event(expires_next, 1);
   printk_once(KERN_WARNING hrtimer: interrupt took %llu ns\n,
   ktime_to_ns(delta));

So I'm not sure that this is the correct thing to do.

Is this reproducible on any ARM system?  I'll grab an x86_64 box and give it a
shot there too.  Can you dump the values of now, delta, and expires_next when
the printk_once triggers?

P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


hrtimer: one more expiry time overflow check in hrtimer_interrupt

2013-06-11 Thread Shinya Kuribayashi
When executing a date command to set the system date and time to a few
seconds before the 2038 problem expiration time, we got a WARN_ON_ONCE()
like this:

  root@renesas:~# date -s "2038-1-19 3:14:00"
  Tue Jan 19 03:14:00 GMT 2038
  (then wait for 7-8 seconds)
  root@renesas:~# [   27.662658] [ cut here ]
  [   27.667297] WARNING: at kernel/time/clockevents.c:209 
clockevents_program_event+0x3c/0x138()
  [   27.675720] Modules linked in:
  [   27.678802] [] (unwind_backtrace+0x0/0xe0) from [] 
(warn_slowpath_common+0x4c/0x64)
  [   27.688201] [] (warn_slowpath_common+0x4c/0x64) from 
[] (warn_slowpath_null+0x18/0x1c)
  [   27.697845] [] (warn_slowpath_null+0x18/0x1c) from [] 
(clockevents_program_event+0x3c/0x138)
  [   27.708007] [] (clockevents_program_event+0x3c/0x138) from 
[] (tick_program_event+0x2c/0x34)
  [   27.718170] [] (tick_program_event+0x2c/0x34) from [] 
(hrtimer_interrupt+0x268/0x2a8)
  [   27.727752] [] (hrtimer_interrupt+0x268/0x2a8) from [] 
(cmt_timer_interrupt+0x2c/0x34)
  [   27.737396] [] (cmt_timer_interrupt+0x2c/0x34) from [] 
(handle_irq_event_percpu+0xb0/0x2a8)
  [   27.747467] [] (handle_irq_event_percpu+0xb0/0x2a8) from 
[] (handle_irq_event+0x58/0x74)
  [   27.757293] [] (handle_irq_event+0x58/0x74) from [] 
(handle_fasteoi_irq+0xc0/0x148)
  [   27.72] [] (handle_fasteoi_irq+0xc0/0x148) from [] 
(generic_handle_irq+0x20/0x30)
  [   27.776245] [] (generic_handle_irq+0x20/0x30) from [] 
(handle_IRQ+0x60/0x84)
  [   27.785003] [] (handle_IRQ+0x60/0x84) from [] 
(gic_handle_irq+0x34/0x4c)
  [   27.793426] [] (gic_handle_irq+0x34/0x4c) from [] 
(__irq_svc+0x40/0x70)
  [   27.801788] Exception stack(0xc04aff68 to 0xc04affb0)
  [   27.806823] ff60:    f010 0001  
c04ae000 c04ec388
  [   27.815002] ff80: c04b604c c0840d80 40004059 412fc093   
c04ce140 c04affb0
  [   27.823150] ffa0: c000f064 c000f068 6013 
  [   27.828216] [] (__irq_svc+0x40/0x70) from [] 
(default_idle+0x24/0x2c)
  [   27.836395] [] (default_idle+0x24/0x2c) from [] 
(cpu_idle+0x74/0xc8)
  [   27.844451] [] (cpu_idle+0x74/0xc8) from [] 
(start_kernel+0x248/0x288)
  [   27.852722] ---[ end trace 9d8ad385bde80fd3 ]---
  [   27.857330] hrtimer: interrupt took 0 ns

This is triggered with our v3.4-based custom ARM kernel, but we
confirmed that v3.10-rc can still have the same problem.

I found a similar issue fixed in v3.9 by Prarit Bhargava in commit
8f294b5a13 (hrtimer: Add expiry time overflow check in hrtimer_interrupt,
2013-04-08).  It tried to resolve a overflow issue detected around 1970
+ 100 seconds.

On the other hand, we have another call site of tick_program_event() at
the bottom of hrtimer_interrupt().  The warning this time is triggered
there, so we need to apply the same fix to it.

Reported-by: Hiroyuki Yokoyama 
Signed-off-by: Shinya Kuribayashi 
---

Hi Prarit-san and John-san,

http://git.kernel.org/linus/8f294b5a139ee4b75e890ad5b443c93d1e558a8b
hrtimer: Add expiry time overflow check in hrtimer_interrupt

I tried to fix the other case of overflow issues in hrtimer_interrupt(),
but not sure it should be worked around like this in the first place.

Any comments are appreciated, thanks in advance.

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index cdd5607..a42d712 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1368,6 +1370,8 @@ retry:
expires_next = ktime_add_ns(now, 100 * NSEC_PER_MSEC);
else
expires_next = ktime_add(now, delta);
+   if (expires_next.tv64 < 0)
+   expires_next.tv64 = KTIME_MAX;
tick_program_event(expires_next, 1);
printk_once(KERN_WARNING "hrtimer: interrupt took %llu ns\n",
ktime_to_ns(delta));

--
Shinya Kuribayashi
Renesas Electronics
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


hrtimer: one more expiry time overflow check in hrtimer_interrupt

2013-06-11 Thread Shinya Kuribayashi
When executing a date command to set the system date and time to a few
seconds before the 2038 problem expiration time, we got a WARN_ON_ONCE()
like this:

  root@renesas:~# date -s 2038-1-19 3:14:00
  Tue Jan 19 03:14:00 GMT 2038
  (then wait for 7-8 seconds)
  root@renesas:~# [   27.662658] [ cut here ]
  [   27.667297] WARNING: at kernel/time/clockevents.c:209 
clockevents_program_event+0x3c/0x138()
  [   27.675720] Modules linked in:
  [   27.678802] [c00130ec] (unwind_backtrace+0x0/0xe0) from [c001f4d8] 
(warn_slowpath_common+0x4c/0x64)
  [   27.688201] [c001f4d8] (warn_slowpath_common+0x4c/0x64) from 
[c001f508] (warn_slowpath_null+0x18/0x1c)
  [   27.697845] [c001f508] (warn_slowpath_null+0x18/0x1c) from [c00549bc] 
(clockevents_program_event+0x3c/0x138)
  [   27.708007] [c00549bc] (clockevents_program_event+0x3c/0x138) from 
[c005510c] (tick_program_event+0x2c/0x34)
  [   27.718170] [c005510c] (tick_program_event+0x2c/0x34) from [c003fa98] 
(hrtimer_interrupt+0x268/0x2a8)
  [   27.727752] [c003fa98] (hrtimer_interrupt+0x268/0x2a8) from [c00180c8] 
(cmt_timer_interrupt+0x2c/0x34)
  [   27.737396] [c00180c8] (cmt_timer_interrupt+0x2c/0x34) from [c0066748] 
(handle_irq_event_percpu+0xb0/0x2a8)
  [   27.747467] [c0066748] (handle_irq_event_percpu+0xb0/0x2a8) from 
[c0066998] (handle_irq_event+0x58/0x74)
  [   27.757293] [c0066998] (handle_irq_event+0x58/0x74) from [c0068f24] 
(handle_fasteoi_irq+0xc0/0x148)
  [   27.72] [c0068f24] (handle_fasteoi_irq+0xc0/0x148) from [c0066014] 
(generic_handle_irq+0x20/0x30)
  [   27.776245] [c0066014] (generic_handle_irq+0x20/0x30) from [c000ef54] 
(handle_IRQ+0x60/0x84)
  [   27.785003] [c000ef54] (handle_IRQ+0x60/0x84) from [c0009334] 
(gic_handle_irq+0x34/0x4c)
  [   27.793426] [c0009334] (gic_handle_irq+0x34/0x4c) from [c000e2c0] 
(__irq_svc+0x40/0x70)
  [   27.801788] Exception stack(0xc04aff68 to 0xc04affb0)
  [   27.806823] ff60:    f010 0001  
c04ae000 c04ec388
  [   27.815002] ff80: c04b604c c0840d80 40004059 412fc093   
c04ce140 c04affb0
  [   27.823150] ffa0: c000f064 c000f068 6013 
  [   27.828216] [c000e2c0] (__irq_svc+0x40/0x70) from [c000f068] 
(default_idle+0x24/0x2c)
  [   27.836395] [c000f068] (default_idle+0x24/0x2c) from [c000f338] 
(cpu_idle+0x74/0xc8)
  [   27.844451] [c000f338] (cpu_idle+0x74/0xc8) from [c048c6d4] 
(start_kernel+0x248/0x288)
  [   27.852722] ---[ end trace 9d8ad385bde80fd3 ]---
  [   27.857330] hrtimer: interrupt took 0 ns

This is triggered with our v3.4-based custom ARM kernel, but we
confirmed that v3.10-rc can still have the same problem.

I found a similar issue fixed in v3.9 by Prarit Bhargava in commit
8f294b5a13 (hrtimer: Add expiry time overflow check in hrtimer_interrupt,
2013-04-08).  It tried to resolve a overflow issue detected around 1970
+ 100 seconds.

On the other hand, we have another call site of tick_program_event() at
the bottom of hrtimer_interrupt().  The warning this time is triggered
there, so we need to apply the same fix to it.

Reported-by: Hiroyuki Yokoyama hiroyuki.yokoyama...@renesas.com
Signed-off-by: Shinya Kuribayashi shinya.kuribayashi...@renesas.com
---

Hi Prarit-san and John-san,

http://git.kernel.org/linus/8f294b5a139ee4b75e890ad5b443c93d1e558a8b
hrtimer: Add expiry time overflow check in hrtimer_interrupt

I tried to fix the other case of overflow issues in hrtimer_interrupt(),
but not sure it should be worked around like this in the first place.

Any comments are appreciated, thanks in advance.

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index cdd5607..a42d712 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1368,6 +1370,8 @@ retry:
expires_next = ktime_add_ns(now, 100 * NSEC_PER_MSEC);
else
expires_next = ktime_add(now, delta);
+   if (expires_next.tv64  0)
+   expires_next.tv64 = KTIME_MAX;
tick_program_event(expires_next, 1);
printk_once(KERN_WARNING hrtimer: interrupt took %llu ns\n,
ktime_to_ns(delta));

--
Shinya Kuribayashi
Renesas Electronics
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/