On Sat, 18 Jul 2015, Mark Johnston wrote:
Log: Pass the lock object to lockstat_nsecs() and return immediately if LO_NOPROFILE is set. Some timecounter handlers acquire a spin mutex, and we don't want to recurse if lockstat probes are enabled.
It is an error to call timecounter code from a low-level place like the mutex implementation. This workaround depends on all locks in timecounter handlers being LO_NOPROFILE, and that breaks profiling of these locks. The breakage is largest if the locks are used for more than timecounters. E.g., the one for i386's obsolescent i8254 timer has to be used for all accesses to the i8254. This lock is configured as MTX_SPIN | MTX_NOPROFILE and is perhaps the main one fixed by this commit. KTR handles this problem badly in a different way. It abuses get_cyclecount(). The abuse is parametrized by defining KTR_TIME as get_cyclecount() and using KTR_TIME. The definition of KTR_TIME is ifdefed but not in a usable way. This is of course undocumented, and KTR_TIME is not a normal kernel option so it is hard to define in config files. KTR needs to be about as careful about this as lockstat since it is called from low level mutex code for debugging. I think there is nothing like LO_NOPROFILE to turn it off. Of course you can turn off the mutex code's calls to it using MTX_QUIET or MTX_NOWITNESS, but this breaks much more than lock profiling. (I rarely use any lock debugging, but when I did I found it useful to use it without WITNESS_SKIPSPIN and with MTX_QUIET and MTX_NOWITNESS turned off for the locks being debugged. Some console drivers use spinlocks with MTX_QUIET or MTX_NOWITNESS to avoid various problems, and this breaks witnessing of them even for non-console uses. Unbreaking their witnessing uncovered further bugs.) Abusing get_cyclecount() in KTR doesn't even fix the problem with timecounters, since some implementations of get_cyclecount() (some arm, some i386) use the timecounter. Other bugs with using get_cyclecount(): - even if it returns cycles counted by a clock, the frequency of this clock is unknown/unobtainable, so there is no way to convert to human-readable units - some arches return a shifted binuptime() that wraps very quickly (256 seconds for arm with ARM_ARCH < 6). Such counters cannot be monotonic. - some arches (arm again?) used to use binuptime() with swizzling more complicated than shifting. Their counters were further from being monotonic. - arm is now the only arch that uses binuptime() at a low level. arm64 use a stub that returns 1. ARM_ARCH < 6 does the above. i386 returns cpu_ticks(). This is the correct method. It uses the hardware counter use by cpu_ticks() if possible, else the raw value of the current hardware timecounter, adjusted to avoid wrap. (This is not actually a correct method, since its implementation is no good. The wrap adjustment is down with no locking. Also, switching the timecounter hardware breaks the get_cyclecount() use even more than the thread resource usage use. The latter has incomplete recalibration.) - KTR of course has no support for converting to human-readable units. ktrdump prints raw timestamps and their differences in %16ju format. kern_ktr.c seems to print them only in a ddb command. This prints the raw value in the much worse format %10.10lld. This uses the long long abomination and has a sign error (ktr_timestamp has type uint64_t). - ktr's "timestamps" are thus at best a cookie that increases strictly monotonically - get_cyclecount() is always rdtsc() on modern x86. rdtsc() is not a serializing instruction. I recently learned how unserial it can be -- hundreds of cycles on freefall. The hundred-cycle case probably doesn't happen much in KTR's use. It happens in the test program when the program spins waiting for something and executes rdtsc() after that, or so it thinks. The CPU actually executes rdtsc() speculatively hundreds of cycles earlier while spinning. Back to back rdtsc()s seem to be executed in order on freefall, and the speculation seems to extend to only the first one. So there is a good chance that get_cyclecount() called on the same CPU gives a strictly increasing value. But across CPUs, the hundred-cycle difference are the usual case if there is no synchronization of the threads (even when the TSCs on the different CPUs are perfectly synchronized at the hardware level). - get_cyclecount() doesn't bother with synchronization if it uses the TSC (or equivalent). This is a feature. It is supposed to be efficient even if this requires it to return garbage. But this makes it unusable for almost everything. Certainly for timestamps. - get_cyclecount()'s documentation has rotted to match its misuse. It is now documented as returning a value that is monotonically increasing inside each CPU. But it never obviously did that on x86 (rdtsc isn't a serializing instruction, and the implementation doesn't even have a compiler barrier), and it doesn't do that on arm < 6 (wrap occurs after 256 seconds). Its documentation does a good job of describing its non-monotonicity across CPUs. cpu_ticks() also doesn't require any synchronization across CPUs. This is less clear since it has null documenation. KTR shouldn't try to synchronize threads to get timestamps that are monotonic across CPUs relative to some shared (non-relativistic) clock, although this would be useful for debugging synchronization, since this would be slow (100-200 cycles per sync on freefall) and would break debugging of synchronization bugs by adding synchronization. Timecounters give timestamps that are almost comparable across CPUs, even when they use the TSC timecounter. They do this by reading a shared hardware timecounter, perhaps even with a lock that serializes everything, or by reading something like a shared TSC that is not fully serialized. For the TSC, low-level code gives serialization for each CPU. The TSCs ar hopefully synced at the hardware level, so that when you read them at almost the same physical time on separate CPUs you get almost the same value. The physical time might be affected relativistically, but CPUs are not yet quite fast enough for that to be a large effect. The largest effect is probably from pipeline delays. Bruce _______________________________________________ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"