Module Name: src Committed By: msaitoh Date: Mon Jun 15 09:09:24 UTC 2020
Modified Files: src/sys/arch/amd64/amd64: cpufunc.S src/sys/arch/i386/i386: cpufunc.S src/sys/arch/x86/include: cpu_counter.h cpufunc.h src/sys/arch/x86/x86: cpu.c hyperv.c tsc.c tsc.h src/sys/rump/librump/rumpkern/arch/x86: rump_x86_cpu_counter.c Log Message: Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely. x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it still has room. I measured the effect of lfence, mfence, cpuid and rdtscp. The impact to TSC skew and/or drift is: AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify Intel: lfence > rdtscp > cpuid > nomodify So, mfence is the best on AMD and lfence is the best on Intel. If it has no SSE2, we can use cpuid. NOTE: - An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for serializing, but it's not so good. - On Intel i386(not amd64), it seems the improvement is very little. - rdtscp instruct can be used as serializing instruction + rdtsc, but it's not good as [lm]fence. Both Intel and AMD's document say that the latency of rdtscp is bigger than rdtsc, so I suspect the difference of the result comes from it. To generate a diff of this commit: cvs rdiff -u -r1.60 -r1.61 src/sys/arch/amd64/amd64/cpufunc.S cvs rdiff -u -r1.46 -r1.47 src/sys/arch/i386/i386/cpufunc.S cvs rdiff -u -r1.6 -r1.7 src/sys/arch/x86/include/cpu_counter.h cvs rdiff -u -r1.40 -r1.41 src/sys/arch/x86/include/cpufunc.h cvs rdiff -u -r1.193 -r1.194 src/sys/arch/x86/x86/cpu.c cvs rdiff -u -r1.9 -r1.10 src/sys/arch/x86/x86/hyperv.c cvs rdiff -u -r1.50 -r1.51 src/sys/arch/x86/x86/tsc.c cvs rdiff -u -r1.6 -r1.7 src/sys/arch/x86/x86/tsc.h cvs rdiff -u -r1.1 -r1.2 \ src/sys/rump/librump/rumpkern/arch/x86/rump_x86_cpu_counter.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.