On Sat, Feb 11, 2012 at 22:16, Chuck Swiger <[email protected]> wrote: > On Feb 11, 2012, at 11:58 AM, Dave Hart wrote: >> On Sat, Feb 11, 2012 at 17:17, Chuck Swiger <[email protected]> wrote: >>>> Have you tried to time the minimum clock reading time with RDTSC >>>> or GetPerformance* counter calls? >>>> >>>> I wrote a tiny test program on my Win7-64 laptop, it got: >>>> >>>> Reading the system clock 10000000 times, minimum reading time = >>>> 24 clock cycles, >>>> minimum OS step = 0 ticks, maximum OS step = 10000 ticks >>>> >>>> The clock frequency is 2.7 GHz or so, the FileTime ticks should be >>>> 100 ns each, so my OS clock is making 1 ms steps, while the clock >>>> reading time is down to less than 10 ns! >>> >>> Well, the code above is not reading a clock; you're reading the >>> stored value of what time it was when the kernel's scheduler last >>> updated that value. When the OS scheduler ticks, it reads a clock, >>> then trims down the precision to the scheduler quantum (ie, 1ms >>> for HZ=1000), and stores in a "commpage", which is mapped RO >>> into userland processes. >> >> Terje's code is reading the only clock available on Windows. > > Terje's GetSystemTimeAsFTime() was exactly what I described > above: a userland function which looks up a pre-stored value > which gets updated periodically by the kernel, but is not actually > calling a real clock which will return the time as seen when the > clock is read *right* *then* > >> It may not be what you think of as reading a clock based on your >> understanding of other operating systems, but Windows isn't >> necessarily the same as other operating systems. > > A clock isn't a stored value on a memory page, even if that value > gets periodically updated within the system scheduler on per HZ > interrupt, via the Windows multimedia timer, or whatever. > > A clock is an oscillator and a counter. (Go read VMWare's > "Timekeeping-In-VirtualMachines.pdf" or PHK's > "timecounter.pdf" for considerably more detailed description > and examples if this is unclear.)
By your definition, NTP was developed and used for quite a few years on operating systems which lacked a clock. I have to say, as impressed as I have always been with NTP and Dr. Mills, I'm even more impressed to know he spent a decade doing the impossible. PHK's timecounter.pdf [1] (circa 2002-2004 [1][2]) says "We can therefore with good fidelity define 'a clock' to be the combination of an oscillator and a counting mechanism." Nearly every computer system which provided time-of-day to applications _ever_ meets that definition. The classic tick-based software clock I described meets the definition. The oscillator was in some cases simply AC 50/60 Hz supply, much more commonly in the home/personal computer era a quartz crystal paired with a hardware counter circuit configured as a divider feeding an interrupt line triggered 10 to 100 times per second. The counting mechanism was the tick interrupt handler, which incremented the software clock by the appropriate amount, 1/10th to 1/100th of a second (plus or minus a smidge from adjtime use, where present). As for the VMware paper, it looks quite informative and detailed, but I seriously doubt there is anything in there that says anything remotely like "a clock is never implemented in software, and is always a (possibly virtualized) hardware counter interrogated at the time of reading." Perhaps there's something particular you'd like to draw attention to in that voluminous paper to bolster such a claim? >> 15 years ago, most POSIX-style OSes used a simple tick-based system clock >> like Windows >> that was very fast to read, though typically not as fast as Windows' >> because the current time wasn't mapped into unprivileged memory of >> each process, so the time to read the clock was dominated by the >> system call overhead of transitioning to and from kernel mode/code, >> probably a couple of orders of magnitude more expensive than actually >> reading the stored current clock value in the kernel. > > [ ...followed by a long disagreement based on your assessment of > my experience with "POSIX-style OSes"... ] > > The use of a "commpage" (that's a Mac term, the Linux equivalent > appears to be "vsyscall page") Or simply "shared memory", one or more pages mapped into more than one logical address space simultaneously. > to store a low-resolution approximation to "now" was used in pre-OSX > MacOS and in Linux, but it isn't being used under FreeBSD. > And "Windows Services for Unix" claims to be POSIX-compliant or at > least "POSIXy" for NT/Win2k/XP/etc, so the distinction you're drawing > just doesn't appear to make sense. Windows SFU (ahem) is not Windows. NT is a microkernel design (in the Mach sense, not the NTP sense) with multiple OS personalities built on top of it. My discussion relates only to the Windows subsystem of NT, which sometimes has sported OS/2 1.x as well as POSIX/SFU subsystems which operate side-by-side with the Windows subsystem, not on top of it. The only thing special about the Windows subsystem compared to others is it owns the console/GUI and other subsystem I/O is funneled through it. The only distinction I'm trying to draw is simply that you are wrong to claim a software clock that does not involve interrogating hardware is not a clock. > Look, I admire the notion of quibbling over details, but not when it > is used to obscure the central point rather than help resolve it. I'm not attempting to obscure any point, central or not. I'm saying you're wrong about GetSystemTimeAsFileTime not reading a clock. That is the only way a Windows API program can read the Windows clock, and Terje was perfectly correct to use it. >>> Also please note that you can't just call rdtsc by itself and get >>> good results. At the very least, you want to put a serializing >>> instruction like cpuid first, and secondly, you really want to call >>> rdtsc three times, and verify that t1 <= t2 <= t3, otherwise you >>> have a chance of getting bogus time results due to your thread >>> switching to another processor core between calls, CPU power- >>> state changes, chipset bugs, interference from SMC interrupts, >>> and PHK knows what else. :-) >> >> Not on modern AMDs, or any Intel, as far as my admittedly sub-PHK >> understanding goes. AMD really screwed the pooch by allowing the TSC >> to vary between processors and vary with power state, causing all >> sorts of headaches for all sorts of software. Even on buggy systems, >> reading TSC once is enough if you've locked the thread to a single >> logical processor. > > See http://en.wikipedia.org/wiki/Time_Stamp_Counter and references. Thanks for the pointer. I learned some Intel processors I didn't have experience with also suffer power-state-related TSC issues. I also note it's a Wikipedia article, which means the reliability as an authority is questionable at best. Some articles have thriving collaboration and put other encyclopedic articles to shame. Others suffer from inaccuracies and self-contradiction due to less-than-optimum level of interest in editing encylopedias among those expert in the topic. This article is a mixed bag in that regard. Much of the summary at the top is solidly on-target, but some of it is not. For example, the paragraph mentioning clock_gettime(CLOCK_MONOTONIC) and QueryPerformanceCounter is good stuff, pointing out they provide similar capabilities without the AMD and to a much lesser extent Intel fast-and-loose fallout. On the other hand, the second paragraph is outdated and overgeneralizes, contradicted by the information later in the same article pointing out newer processors have TSCs that don't suffer divergence across logical processors or power-saving-induced rate changes. For those targeting only recent systems, or only non-power-saving Intel systems, TSC is both stable and often much faster than either QPC or CLOCK_MONOTONIC thanks to avoiding system call overhead. If your goal reading the TSC is to timestamp some event that just occurred and calculate a seconds-enumerated timestamp from it, using a serializing instruction first is counterproductive. If your goal is to read TSC before and after some sequence of instructions to later subtract the two TSC values to measure the duration of the sequence, using a serializing instruction is likely wise. That is, "good results" may or may not mandate serializing before reading TSC, depending on the context. [1] http://phk.freebsd.dk/pubs/timecounter.pdf [2] http://phk.freebsd.dk/pubs/ [3] http://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf Clock me over the head, Dave Hart _______________________________________________ questions mailing list [email protected] http://lists.ntp.org/listinfo/questions
