On Thu, Jun 11, 2020 at 04:50:40AM +0000, Taylor R Campbell wrote: > What's trickier is synchronizing per-CPU timecounters so that they all > give a reasonably consistent view of absolute wall clock time -- and > so it's not just one CPU that leads while the others play catchup > every time they try to read the clock. (In other words, adding atomic > catchup logic certainly does not obviate the need to synchronize > per-CPU timecounters!) > > But neither synchronization nor global monotonicity is always > necessary -- e.g., for rusage we really only need a local view of time > since we're only measuring relative time durations spent on the > current CPU anyway. > > > > This is what the timecounter(9) API per se expects of timecounters, > > > and right now tsc (along with various other per-CPU cycle counters) > > > fails to guarantee that. > > > > Howso, do you see a bug? I think it's okay. The TSC is only used for the > > timecounter where it's known that it's insensitive to core speed variations > > and is driven by PLL related to the bus clock. Fortunately that means most > > x86 systems, excepting a window of some years from roughly around the time > > of the Pentium 4 onwards. > > If tc_get_timecount goes backward by a little, e.g. because you > queried it on cpu0 the first time and on cpu1 the second time, > kern_tc.c will interpret that to mean that it has instead jumped > forward by a lot -- nothing in the timecounter abstraction copes with > a timecounter that goes backwards at all.
I thought about it some more and I just don't think we have this problem on x86 anyway. The way I see it, with any counter if you make explicit comparisons on a global basis the counter could appear to go a tiny bit backwards due to timing differences in execution - unless you want to go to some lengths to work around that. I think all you can really expect is for the clock to not go backwards within a single thread of execution. By my understanding that's all the timecounter code expects and the TSC code on x86 makes sure of that. I changed tsc_get_timecount so it'll print a message out if it's ever observed. > (There's also an issue where the `monotonic' clock goes backwards > sometimes, as reported by sched_pstats. I'm not sure anyone has > tracked down where that's coming from -- it seems unlikely to be > related to cross-CPU tsc synchronization because lwp rtime should > generally be computed from differences between samples on a single CPU > at a time, but I don't know.) Hmm. There was a race condition with rusage and softints that I fixed about 6 months ago where proc0 had absurd times in ps/top but I have not seen the "clock has gone backwards" one in a long time. I wonder if it's related. Andrew