Randy Fishel wrote:

> On Mon, 13 Oct 2008, J?rgen Keil wrote:
> 
> > In usr/src/uts/i86pc/os/timestamp.c function tsc_suspend()
> >
> http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/i86pc/os/timestamp.c#tsc_suspend
> > 
> >     703                     if (tsc_delta_onsuspend) {
> >     704                             tsc_adjust_delta(tsc_saved_tsc);
> >     705                     } else {
> >     706                             tsc_adjust_delta(nsec_scale);
> >     707                     }
> > 
> > Can anyone explain why line 706 has to call tsc_adjust_delta() 
> > with parameter nsec_scale?
> > 
> > What exactly is this call supposed to adjust?  The argument for
> > tsc_adjust_delta() is a number of cpu clock ticks, but nsec_scale
> > is (10^9 * 2^27 / cpu_freq). It's not a number of tsc ticks but some
> > scale factor.
> > 
> > On my Tecra S1 Pentium-M 1.3 GHz laptop this adds
> > (10^9*2^27)/1300000000 == 103244406 clock ticks
> > (or 0.079 seconds).    But why?
> > 
> > The tsc_adjust_delta(nsec_scale) call appears bogus to me...
> 
> 
> I am mostly surprised that there isn't a belief that the adjust on 
> line 704 isn't bogus (or at least obsolete, which I am convinced it 
> is).

Line 704 shouldn't run, unless you've changed the tunable
tsc_delta_onsuspend = 1.

And I currently don't see why the call at line 704 is bogus,
in case we run with tsc_delta_onsuspend == 1.

> To answer these questions, we first need to  understand that 
> tsc_suspend() is called as we are about to turn off the cpu's on 
> suspend, and therfore stop the tsc leading to a serious time 
> disconnect.  It is in this function where we take a snapshot of the 
> current TOD and tsc, so we can do restoration on resume.
> 
> On resume, the tsc was reset by the cpu power cycle, so timekeeping 
> an no longer be done solely on the tsc, but via an adjusted 
> "effective" tsc that takes into consideration the suspend tsc.  As we 
> have less control of startup vs. shutdown, we choose to make the 
> changeover on suspend, and tsc_adjust_delta() does this for us. 

The way I understand the code is that the tsc adjustment is done on
*resume*, in tsc_resume()  (assuming tsc_delta_onsuspend == 0).

In tsc_resume() It adjusts by the tsc saved at suspend time
(tsc_saved_tsc), plus the number of tsc ticks that would have
occurred if the machine would have been running all the time
instead of sleeping in S3 state for "sleep_sec" seconds.


With tsc_delta_onsuspend == 1, it changes slightly: tsc_suspend
adjusts by the tsc counter before power is removed, and in
tsc_resume it adjusts by the amount of tsc ticks equivalent to
the number of seconds in s3 sleep state.  And in this variant it
doesn't have to adjust by nsec_scale !?!

> It is 
> also useful to be able to account for the tsc value between the 
> snapshot and the actual suspend,

That should be included in the "sleep_sec" that is computed
in tsc_resume() by looking at the time-of-day clock.  But that
only has full second resolution...

> but just as importantly, when we 
> switch to the "effective" value, the new tsc must not be less than the 
> previous.

That shouldn't be a problem, because the code always adds
at least a full second of tsc ticks, in tsc_resume():

                        /*
                         * If the saved sec is less that or equal to
                         * the current ts, then there is likely a
                         * problem with the clock.  Assume at least
                         * one second has passed, so that time goes forward.
                         */
                        if (sleep_sec <= 0) {
                                sleep_sec = 1;
                        }

                        /* How many TSC's should have occured while sleeping */
                        if (tsc_adjust_seconds)
                                sleep_tsc = sleep_sec * cpu_freq_hz;


> Investigation on early machines showed that the tsc roughly 
> incremented by nsec_scale, so that was the value choosen to adjust the 
> new scale.  In theory, it also tracks what the actual machine is 
> doing.  However, as timekeeping from this point till the next power-on 
> is quite course in comparison (only as accurate as the internal TOD), 
> this value is pretty trivial, and a small constant could have done 
> just as well.
> 
> 
> 
>   Does this make more sense?

No.

Btw:

I took a closer look at this code because my Tecra S1 was frequently
hanging during S3 resume on the second suspend/resume cycle.
The first suspend/resume was working fine; but in most (but not all)
cases the second suspend/resume was hanging during s3 resume.

I traced that to the |delay(drv_usectohz(10000));| call
in pci_pre_resume().  delay() apparently needs a working
cyclic clock subsystem, but the cyclic clock subsystem didn't
properly resume, clock() wasn't called, and for that reason lbolt
didn't get incremented.  All of that is needed for a working delay().


I think the root cause is the  |cpu_enable_intr(CPU)| call for the
boot cpu in cpr_impl.c i_cpr_power_down().  On an uppc machine
(the Tecra S1 in an uppc machine) this calls mp_enable_intr()
which lowers the spl with a call to spl0().  spl0() unmasks clock
interrupts on IRQ0. That results in clock interrupts reaching the
cyclic subsystem before the tsc based highres timestamp
kernel code is resumed with tsc_resume().

I guess this confuses the cyclic subsystem on resume,
(which will see timestamps jumping forward by a huge
amount).



I've experimented with resuming the tsc in i_cpr_power_down(),
immediately before enabling interrupts on the boot cpu.
With such a change, the tecra s1 is now able to suspend/resume
without problem in a loop for 20 times ...


diff --git a/usr/src/uts/i86pc/os/cpr_impl.c b/usr/src/uts/i86pc/os/cpr_impl.c
--- a/usr/src/uts/i86pc/os/cpr_impl.c
+++ b/usr/src/uts/i86pc/os/cpr_impl.c
@@ -664,6 +664,8 @@
 
                return (ret);
        } else {
+               extern void tsc_resume(void);
+
                cpr_suspend_succeeded = 1;
 
                power_req.request_type = PMR_PPM_EXIT_SX;
@@ -686,6 +688,8 @@
                if (cpr_panic)
                        PMD(PMD_SX, ("xxx", cpr_suspend_succeeded /
                            (cpr_panic - 1)));
+
+               tsc_resume();
 
                /*
                 * Enable interrupts on boot cpu.
--
This message posted from opensolaris.org

Reply via email to