Randy Fishel wrote: > On Mon, 13 Oct 2008, J?rgen Keil wrote: > > > In usr/src/uts/i86pc/os/timestamp.c function tsc_suspend() > > > http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/i86pc/os/timestamp.c#tsc_suspend > > > > 703 if (tsc_delta_onsuspend) { > > 704 tsc_adjust_delta(tsc_saved_tsc); > > 705 } else { > > 706 tsc_adjust_delta(nsec_scale); > > 707 } > > > > Can anyone explain why line 706 has to call tsc_adjust_delta() > > with parameter nsec_scale? > > > > What exactly is this call supposed to adjust? The argument for > > tsc_adjust_delta() is a number of cpu clock ticks, but nsec_scale > > is (10^9 * 2^27 / cpu_freq). It's not a number of tsc ticks but some > > scale factor. > > > > On my Tecra S1 Pentium-M 1.3 GHz laptop this adds > > (10^9*2^27)/1300000000 == 103244406 clock ticks > > (or 0.079 seconds). But why? > > > > The tsc_adjust_delta(nsec_scale) call appears bogus to me... > > > I am mostly surprised that there isn't a belief that the adjust on > line 704 isn't bogus (or at least obsolete, which I am convinced it > is).
Line 704 shouldn't run, unless you've changed the tunable tsc_delta_onsuspend = 1. And I currently don't see why the call at line 704 is bogus, in case we run with tsc_delta_onsuspend == 1. > To answer these questions, we first need to understand that > tsc_suspend() is called as we are about to turn off the cpu's on > suspend, and therfore stop the tsc leading to a serious time > disconnect. It is in this function where we take a snapshot of the > current TOD and tsc, so we can do restoration on resume. > > On resume, the tsc was reset by the cpu power cycle, so timekeeping > an no longer be done solely on the tsc, but via an adjusted > "effective" tsc that takes into consideration the suspend tsc. As we > have less control of startup vs. shutdown, we choose to make the > changeover on suspend, and tsc_adjust_delta() does this for us. The way I understand the code is that the tsc adjustment is done on *resume*, in tsc_resume() (assuming tsc_delta_onsuspend == 0). In tsc_resume() It adjusts by the tsc saved at suspend time (tsc_saved_tsc), plus the number of tsc ticks that would have occurred if the machine would have been running all the time instead of sleeping in S3 state for "sleep_sec" seconds. With tsc_delta_onsuspend == 1, it changes slightly: tsc_suspend adjusts by the tsc counter before power is removed, and in tsc_resume it adjusts by the amount of tsc ticks equivalent to the number of seconds in s3 sleep state. And in this variant it doesn't have to adjust by nsec_scale !?! > It is > also useful to be able to account for the tsc value between the > snapshot and the actual suspend, That should be included in the "sleep_sec" that is computed in tsc_resume() by looking at the time-of-day clock. But that only has full second resolution... > but just as importantly, when we > switch to the "effective" value, the new tsc must not be less than the > previous. That shouldn't be a problem, because the code always adds at least a full second of tsc ticks, in tsc_resume(): /* * If the saved sec is less that or equal to * the current ts, then there is likely a * problem with the clock. Assume at least * one second has passed, so that time goes forward. */ if (sleep_sec <= 0) { sleep_sec = 1; } /* How many TSC's should have occured while sleeping */ if (tsc_adjust_seconds) sleep_tsc = sleep_sec * cpu_freq_hz; > Investigation on early machines showed that the tsc roughly > incremented by nsec_scale, so that was the value choosen to adjust the > new scale. In theory, it also tracks what the actual machine is > doing. However, as timekeeping from this point till the next power-on > is quite course in comparison (only as accurate as the internal TOD), > this value is pretty trivial, and a small constant could have done > just as well. > > > > Does this make more sense? No. Btw: I took a closer look at this code because my Tecra S1 was frequently hanging during S3 resume on the second suspend/resume cycle. The first suspend/resume was working fine; but in most (but not all) cases the second suspend/resume was hanging during s3 resume. I traced that to the |delay(drv_usectohz(10000));| call in pci_pre_resume(). delay() apparently needs a working cyclic clock subsystem, but the cyclic clock subsystem didn't properly resume, clock() wasn't called, and for that reason lbolt didn't get incremented. All of that is needed for a working delay(). I think the root cause is the |cpu_enable_intr(CPU)| call for the boot cpu in cpr_impl.c i_cpr_power_down(). On an uppc machine (the Tecra S1 in an uppc machine) this calls mp_enable_intr() which lowers the spl with a call to spl0(). spl0() unmasks clock interrupts on IRQ0. That results in clock interrupts reaching the cyclic subsystem before the tsc based highres timestamp kernel code is resumed with tsc_resume(). I guess this confuses the cyclic subsystem on resume, (which will see timestamps jumping forward by a huge amount). I've experimented with resuming the tsc in i_cpr_power_down(), immediately before enabling interrupts on the boot cpu. With such a change, the tecra s1 is now able to suspend/resume without problem in a loop for 20 times ... diff --git a/usr/src/uts/i86pc/os/cpr_impl.c b/usr/src/uts/i86pc/os/cpr_impl.c --- a/usr/src/uts/i86pc/os/cpr_impl.c +++ b/usr/src/uts/i86pc/os/cpr_impl.c @@ -664,6 +664,8 @@ return (ret); } else { + extern void tsc_resume(void); + cpr_suspend_succeeded = 1; power_req.request_type = PMR_PPM_EXIT_SX; @@ -686,6 +688,8 @@ if (cpr_panic) PMD(PMD_SX, ("xxx", cpr_suspend_succeeded / (cpr_panic - 1))); + + tsc_resume(); /* * Enable interrupts on boot cpu. -- This message posted from opensolaris.org