Re: Timekeeping issue on aggressive suspend/resume

2010-06-14 Thread Suresh Rajashekara
On Thu, Jun 10, 2010 at 12:52 PM, john stultz johns...@us.ibm.com wrote:
 I think Thomas was suggesting that you consider creating a option for
 where CLOCK_MONOTONIC included total_sleep_time.

 In that case the *hack* (and this is a hack, we'll need some more
 thoughtful discussion before anything like it could make it upstream)
 would be in timekeeping_resume() to comment out the lines that update
 wall_to_monotonic and total_sleep_time.

 It would be interesting to hear if that hack works for you, and we can
 try to come up with a better way to think about how to accommodate both
 views of how to account time over suspend.

Thanks.

I tried this fix. It seemed to help, however the accuracy of sleep
time for the process was not quite right. A process thread which was
supposed to wake up every (X) seconds, seemed to wake up every (X -
delta X) seconds.

Also another side effect of this change was that the system time was
no longer in sync with the wall time.

These problems were more pronounced when the suspend/wakeup cycle time
was brought down to 0.5 seconds from 4 seconds. The periodicity of
most of the process threads were disturbed.

I decided to NOT suspend/resume the timekeeping subsystem in the
kernel and try. It seemed to work. Every application seems to work
fine.

Now my question is; Is it safe to disable suspend/resume of the
timekeeping subsystem? Will it have an effect (on
functionality/performance) which may not surface in my short
experiments?

Thanks in advance,

Suresh
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Timekeeping issue on aggressive suspend/resume

2010-06-14 Thread john stultz
On Mon, 2010-06-14 at 00:46 -0700, Suresh Rajashekara wrote:
 On Thu, Jun 10, 2010 at 12:52 PM, john stultz johns...@us.ibm.com wrote:
  I think Thomas was suggesting that you consider creating a option for
  where CLOCK_MONOTONIC included total_sleep_time.
 
  In that case the *hack* (and this is a hack, we'll need some more
  thoughtful discussion before anything like it could make it upstream)
  would be in timekeeping_resume() to comment out the lines that update
  wall_to_monotonic and total_sleep_time.
 
  It would be interesting to hear if that hack works for you, and we can
  try to come up with a better way to think about how to accommodate both
  views of how to account time over suspend.
 
 Thanks.
 
 I tried this fix. It seemed to help, however the accuracy of sleep
 time for the process was not quite right. A process thread which was
 supposed to wake up every (X) seconds, seemed to wake up every (X -
 delta X) seconds.

Ah, the sleep time is probably too coarse (seconds). We probably need to
increase the granularity from read_persistent_clock() and see if that
helps (although most persistent clocks aren't very fine grained).

 Also another side effect of this change was that the system time was
 no longer in sync with the wall time.

? This doesn't make much sense to me, as you shouldn't be manipulating
xtime differently.

Just to be clear, you mean the value from date doesn't match your
watch after resume? 

 These problems were more pronounced when the suspend/wakeup cycle time
 was brought down to 0.5 seconds from 4 seconds. The periodicity of
 most of the process threads were disturbed.
 
 I decided to NOT suspend/resume the timekeeping subsystem in the
 kernel and try. It seemed to work. Every application seems to work
 fine.
 
 Now my question is; Is it safe to disable suspend/resume of the
 timekeeping subsystem? Will it have an effect (on
 functionality/performance) which may not surface in my short
 experiments?

Well, the difficultly here is what folks actually mean by suspend. On
some hardware it means everything is powered off, and so on resume we
have to re-init hardware values.

It seems in your case that the hardware isn't completely powered off,
since the clocksource you're using seemed to continue counting while the
system was suspended. 

So in this case you might be ok. Your suspend seems closer to an deep
idle state on x86. So suspending timekeeping might not be necessary.

However, you're right that there may be lurking issues:

1) The suspend time would have to be limited to the clocksource's
max_idle_ns value, since after that amount of cycles have past, we might
overflow the accumulation function, or the clocksource may have wrapped.

2) If the hardware does reset the clocksource at some point during the
suspend, you'll have odd time issues.

3) You could run into some difficulty keeping close sync with an NTP
server, as the long delays between accumulation will probably cause an
oscillating over-shoot and over-correction.

I suspect these different definitions of suspend on all of the
different hardware types out there is going to be a growing problem in
the near term. Especially as deep idle states start to power off more
hardware and becomes closer to suspend in behavior.

thanks
-john

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Timekeeping issue on aggressive suspend/resume

2010-06-14 Thread Thomas Gleixner
On Mon, 14 Jun 2010, john stultz wrote:

 On Mon, 2010-06-14 at 00:46 -0700, Suresh Rajashekara wrote:
  On Thu, Jun 10, 2010 at 12:52 PM, john stultz johns...@us.ibm.com wrote:
   I think Thomas was suggesting that you consider creating a option for
   where CLOCK_MONOTONIC included total_sleep_time.
  
   In that case the *hack* (and this is a hack, we'll need some more
   thoughtful discussion before anything like it could make it upstream)
   would be in timekeeping_resume() to comment out the lines that update
   wall_to_monotonic and total_sleep_time.
  
   It would be interesting to hear if that hack works for you, and we can
   try to come up with a better way to think about how to accommodate both
   views of how to account time over suspend.
  
  Thanks.
  
  I tried this fix. It seemed to help, however the accuracy of sleep
  time for the process was not quite right. A process thread which was
  supposed to wake up every (X) seconds, seemed to wake up every (X -
  delta X) seconds.
 
 Ah, the sleep time is probably too coarse (seconds). We probably need to
 increase the granularity from read_persistent_clock() and see if that
 helps (although most persistent clocks aren't very fine grained).
 
  Also another side effect of this change was that the system time was
  no longer in sync with the wall time.
 
 ? This doesn't make much sense to me, as you shouldn't be manipulating
 xtime differently.
 
 Just to be clear, you mean the value from date doesn't match your
 watch after resume? 
 
  These problems were more pronounced when the suspend/wakeup cycle time
  was brought down to 0.5 seconds from 4 seconds. The periodicity of
  most of the process threads were disturbed.
  
  I decided to NOT suspend/resume the timekeeping subsystem in the
  kernel and try. It seemed to work. Every application seems to work
  fine.
  
  Now my question is; Is it safe to disable suspend/resume of the
  timekeeping subsystem? Will it have an effect (on
  functionality/performance) which may not surface in my short
  experiments?
 
 Well, the difficultly here is what folks actually mean by suspend. On
 some hardware it means everything is powered off, and so on resume we
 have to re-init hardware values.
 
 It seems in your case that the hardware isn't completely powered off,
 since the clocksource you're using seemed to continue counting while the
 system was suspended. 
 
 So in this case you might be ok. Your suspend seems closer to an deep
 idle state on x86. So suspending timekeeping might not be necessary.
 
 However, you're right that there may be lurking issues:
 
 1) The suspend time would have to be limited to the clocksource's
 max_idle_ns value, since after that amount of cycles have past, we might
 overflow the accumulation function, or the clocksource may have wrapped.
 
 2) If the hardware does reset the clocksource at some point during the
 suspend, you'll have odd time issues.
 
 3) You could run into some difficulty keeping close sync with an NTP
 server, as the long delays between accumulation will probably cause an
 oscillating over-shoot and over-correction.
 
 I suspect these different definitions of suspend on all of the
 different hardware types out there is going to be a growing problem in
 the near term. Especially as deep idle states start to power off more
 hardware and becomes closer to suspend in behavior.

I don't think it's really an issue. Such hardware uses a 32.768kHz
driven (RTC alike) clocksource/event which is never powered off and
not affected by suspend/resume unless you run out of battery. That
hardware provides sub second resolution (~30us) contrary to the PC
style RTC which gives you seconds only. That's really good enough for
timekeeping, NOHZ and even HIGHRES.

The NTP sync might become an issue for real long sleep times, but
that's an NTP problem and needs to be addressed seperately.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Timekeeping issue on aggressive suspend/resume

2010-06-11 Thread Thomas Petazzoni
Hello Suresh,

On Wed, 9 Jun 2010 12:50:39 -0700
Suresh Rajashekara suresh.raj+linuxo...@gmail.com wrote:

 I have an application (running on 2.6.29-omap1) which puts an OMAP1
 system to suspend aggressively. The system wakes up every 4 seconds
 and stays awake for about 35 milliseconds and sleeps again for another
 4 seconds. This design is to save power on a battery operated device.
 
 This aggressive suspend resume action seems like creating an issue to
 other applications in the system waiting for some timeout to happen
 (especially an application which is waiting using the mq_timedreceive
 and is supposed to timeout every 30 seconds. It seems to wake up every
 90 seconds). Seems like the timekeeping is not happening properly in
 side the kernel.
 
 If the suspend duration is changed from 4 second to 1 second, then
 things work somewhat better. On reducing it to 0.5 second (which was
 our earlier design on 2.6.16-rc3), the problem seems to disappear.

I've done a relatively similar thing on different CPU architecture: in
the idle loop, when the CPU is going to be idle for a sufficiently long
period of time, I power down the CPU completely. Before that, I've
programmed a RTC (clocked at 32 khz) to wake-up the CPU a little bit
*before* the expiration of the next timer. When the CPU wakes-up, I
adjust the clocksource (in this case the CPU cycle counter) to
compensate the time spent while the CPU was off, and I reprogram the
clockevents to make sure that the timer will actually expire at the
correct time, also by compensating the time during which the CPU was
off (note: when the CPU is off, the cycle counter stops incrementing,
and the timer used as clockevents stops decrementing). This way, the
CLOCK_MONOTONIC time continues to go forward even when the CPU is off.
The goal was to make the CPU is off case just another idle state of
the system, which should just be as transparent to the life of the
system as other idle states. So an application that uses a periodic
timer of say, 30 milliseconds, will see its timer actually fired every
30 milliseconds even though the CPU goes off between each timer
expiration (we've done measurements with a scope, and the timer rely
expires every 30 milliseconds as expected).

FWIW, we do not use the normal suspend/resume infrastructure for this,
because it was way too slow (in the order of ~100ms). On the particular
hardware we're using, it takes roughly ~1ms to go OFF, and ~2ms to
completely wake-up, so we can very aggressively put the CPU in the OFF
state.

However, the way we're doing the time compensation is quite hackish,
and it would be great to hear Thomas Gleixner's ideas on how this
should be implemented properly at the clocksource/clock_event_device
level.

Sincerely,

Thomas
-- 
Thomas Petazzoni, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Timekeeping issue on aggressive suspend/resume

2010-06-10 Thread Suresh Rajashekara
On Wed, Jun 9, 2010 at 1:22 PM, Thomas Gleixner t...@linutronix.de wrote:
 Though we could change that conditionally - the default would still be
 the freeze of jiffies and CLOCK_MONOTONIC for historical compability.

If I were to change it only for our implementation, and make all the
user space timers use CLOCK_REALTIME, then could you please point me
in a direction as to what part of the kernel I should be touching to
make that change?

Earlier we faced issue with time that the application sees. It wasn't
getting updated when we suspend and resume the system (where as the
time inside the kernel kept updating) and hence eventually would drift
from the actual time.

for eg, if I use this loop at the command prompt

while date
do
echo mem  /sys/power/state
done

then the date command always displayed the same time, but the prints
from the kernel (I was using the printk time information) was
advancing as expected.

I found a patch at
https://patchwork.kernel.org/patch/50070/

Though this fixed the application time update issue, there are lot of
timers in the application which is still not working right.

Could anyone please point in some direction to find the solution?

Thanks,
Suresh
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Timekeeping issue on aggressive suspend/resume

2010-06-10 Thread john stultz
On Wed, 2010-06-09 at 23:34 -0700, Suresh Rajashekara wrote:
 On Wed, Jun 9, 2010 at 1:22 PM, Thomas Gleixner t...@linutronix.de wrote:
  Though we could change that conditionally - the default would still be
  the freeze of jiffies and CLOCK_MONOTONIC for historical compability.
 
 If I were to change it only for our implementation, and make all the
 user space timers use CLOCK_REALTIME, then could you please point me
 in a direction as to what part of the kernel I should be touching to
 make that change?

I think Thomas was suggesting that you consider creating a option for
where CLOCK_MONOTONIC included total_sleep_time.

In that case the *hack* (and this is a hack, we'll need some more
thoughtful discussion before anything like it could make it upstream)
would be in timekeeping_resume() to comment out the lines that update
wall_to_monotonic and total_sleep_time.

It would be interesting to hear if that hack works for you, and we can
try to come up with a better way to think about how to accommodate both
views of how to account time over suspend.

Thomas, might this call for a new posix clock_id, CLOCK_BOOTTIME (ie:
CLOCK_MONOTONIC + total_sleep_time) or something that userland could use
to set timers on?

thanks
-john


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Timekeeping issue on aggressive suspend/resume

2010-06-09 Thread Suresh Rajashekara
I have an application (running on 2.6.29-omap1) which puts an OMAP1
system to suspend aggressively. The system wakes up every 4 seconds
and stays awake for about 35 milliseconds and sleeps again for another
4 seconds. This design is to save power on a battery operated device.

This aggressive suspend resume action seems like creating an issue to
other applications in the system waiting for some timeout to happen
(especially an application which is waiting using the mq_timedreceive
and is supposed to timeout every 30 seconds. It seems to wake up every
90 seconds). Seems like the timekeeping is not happening properly in
side the kernel.

If the suspend duration is changed from 4 second to 1 second, then
things work somewhat better. On reducing it to 0.5 second (which was
our earlier design on 2.6.16-rc3), the problem seems to disappear.

Is this expected?

Thanks in advance,

Suresh
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Timekeeping issue on aggressive suspend/resume

2010-06-09 Thread Thomas Gleixner
On Wed, 9 Jun 2010, Suresh Rajashekara wrote:

 I have an application (running on 2.6.29-omap1) which puts an OMAP1
 system to suspend aggressively. The system wakes up every 4 seconds
 and stays awake for about 35 milliseconds and sleeps again for another
 4 seconds. This design is to save power on a battery operated device.
 
 This aggressive suspend resume action seems like creating an issue to
 other applications in the system waiting for some timeout to happen
 (especially an application which is waiting using the mq_timedreceive
 and is supposed to timeout every 30 seconds. It seems to wake up every
 90 seconds). Seems like the timekeeping is not happening properly in
 side the kernel.
 
 If the suspend duration is changed from 4 second to 1 second, then
 things work somewhat better. On reducing it to 0.5 second (which was
 our earlier design on 2.6.16-rc3), the problem seems to disappear.
 
 Is this expected?

Yes, that's caused by the fact that suspend (via sys/power/state )
freezes the kernel internal timers and the user space visible timers
which are based on CLOCK_MONOTONIC or jiffies (like mq_timedreceive on
your .29 kernel). Only CLOCK_REALTIME based timers are kept correct as
we have to align to the wall clock time.

The reason for this is, that otherwise almost all timers are expired
when we resume and we get a thundering herd of apps and kernel
facilities due to firing timeouts.

Another problem is that jiffies can wrap around on 32 bit systems
during a long suspend though I don't think that's a real world problem
as it takes between 49 to 497 days of suspend depending on the HZ
setting. SO for your usecase it would not matter.

I'm more concerned about code getting surprised by firing timers as
the kernel has this behaviour for a long time now.

Though we could change that conditionally - the default would still be
the freeze of jiffies and CLOCK_MONOTONIC for historical compability.

There will be probably some accounting issues. uptime, cpu time of the
suspend task and some others, but that needs to be found out.

Thanks,

tglx



--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html