Re: amd64: simplify TSC sync testing
Hello, I can confirm the same behaviour with this patch applied. $ sysctl kern.timecounter kern.timecounter.tick=1 kern.timecounter.timestepwarnings=0 kern.timecounter.hardware=i8254 kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000) $ sysctl hw hw.machine=amd64 hw.model=AMD Ryzen 5 3500U with Radeon Vega Mobile Gfx hw.ncpu=8 hw.vendor=LENOVO hw.version=ThinkPad E495 Regards, Aslan On Wed, Feb 02, 2022 at 01:51:18PM -0500, Dave Voutila wrote: > > Jason McIntyre writes: > > > On Wed, Feb 02, 2022 at 04:52:40PM +, Stuart Henderson wrote: > >> This definitely wants testing on Ryzen ThinkPads (e.g. > >> E485/E585/X395/T495s) > >> or Inspiron 5505, I see user TSC disabled on a lot of those in dmesglog. > >> > >> > > > > hi. > > > > here are the results from a 5505. was the timecounter meant to switch > > from tsc? > > > > jmc > > > > $ sysctl kern.timecounter > > kern.timecounter.tick=1 > > kern.timecounter.timestepwarnings=0 > > kern.timecounter.hardware=i8254 > > kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000) > > > > I'm seeing the same issue...switching to i8254 pit where before it was > using tsc. :( > > This is a Lenovo X13. dmesg and sysctl output follows. > > -dv > > $ sysctl kern.timecounter > kern.timecounter.tick=1 > kern.timecounter.timestepwarnings=0 > kern.timecounter.hardware=i8254 > kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000) > > > OpenBSD 7.0-current (CUSTOM.MP) #4: Wed Feb 2 13:24:56 EST 2022 > d...@kogelvis2.sisu.home:/usr/src/sys/arch/amd64/compile/CUSTOM.MP > real mem = 16301219840 (15546MB) > avail mem = 15664230400 (14938MB) > random: good seed from bootblocks > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0: SMBIOS rev. 3.2 @ 0xbf711000 (69 entries) > bios0: vendor LENOVO version "R1CET63W(1.32 )" date 04/09/2021 > bios0: LENOVO 20UF0013US > acpi0 at bios0: ACPI 6.3 > acpi0: sleep states S0 S3 S4 S5 > acpi0: tables DSDT FACP SSDT SSDT SSDT IVRS SSDT SSDT TPM2 SSDT MSDM BATB > HPET APIC MCFG SBST WSMT VFCT SSDT CRAT CDIT FPDT SSDT SSDT SSDT UEFI SSDT > SSDT BGRT > acpi0: wakeup devices GPP0(S3) RESA(S3) GPP4(S4) GPP5(S3) L850(S3) GPP6(S3) > GPP7(S3) GP17(S3) XHC0(S3) XHC1(S3) LID_(S4) SLPB(S3) > acpitimer0 at acpi0: 3579545 Hz, 32 bits > acpihpet0 at acpi0: 14318180 Hz > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: AMD Ryzen 5 PRO 4650U with Radeon Graphics, 2096.39 MHz, 17-60-01 > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES > cpu0: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB > 64b/line 8-way L2 cache > cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative > cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative > cpu0: smt 0, core 0, package 0 > mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges > cpu0: apic clock running at 99MHz > cpu0: mwait min=64, max=64, C-substates=1.1, IBE > cpu1 at mainbus0: apid 1 (application processor) > cpu1: AMD Ryzen 5 PRO 4650U with Radeon Graphics, 2096.06 MHz, 17-60-01 > cpu1: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES > cpu1: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB > 64b/line 8-way L2 cache > cpu1: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative > cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative > cpu1: smt 1, core 0, package 0 > cpu2 at mainbus0: apid 2 (application processor) > tsc: cpu0/cpu2 sync round 1: 1093 regressions > tsc: cpu0/cpu2 sync round 1: cpu0 lags cpu2 by 0 cycles > tsc: cpu0/cpu2 sync round 1: cpu2 lags cpu0 by 21 cycles > cpu2: AMD Ryzen 5 PRO 4650U with Radeon Graphics, 2096.06 MHz, 17-60-01 > cpu2: >
Re: amd64: simplify TSC sync testing
Stuart Henderson writes: > Thanks for testing. > > On 2022/02/02 13:51, Dave Voutila wrote: >> >> Jason McIntyre writes: >> >> > On Wed, Feb 02, 2022 at 04:52:40PM +, Stuart Henderson wrote: >> >> This definitely wants testing on Ryzen ThinkPads (e.g. >> >> E485/E585/X395/T495s) >> >> or Inspiron 5505, I see user TSC disabled on a lot of those in dmesglog. >> >> >> >> >> > >> > hi. >> > >> > here are the results from a 5505. was the timecounter meant to switch >> > from tsc? >> > >> > jmc >> > >> > $ sysctl kern.timecounter >> > kern.timecounter.tick=1 >> > kern.timecounter.timestepwarnings=0 >> > kern.timecounter.hardware=i8254 >> > kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) >> > acpitimer0(1000) >> > >> >> I'm seeing the same issue...switching to i8254 pit where before it was >> using tsc. :( > > There are two separate related things, one is the kernel choice, and the > other is whether TSC can be used directly from userland for gettimeofday > and friends without a syscall. Does the dmesg without the diff say "user > TSC disabled"? If so then it was only using it in the kernel. > Yup. I believe this Ryzen laptop has always had userland TSC disabled. I typically see: cpu6: disabling user TSC (skew=-105) -dv
Re: amd64: simplify TSC sync testing
> On Feb 2, 2022, at 13:29, Stuart Henderson wrote: > > Thanks for testing. > >> On 2022/02/02 13:51, Dave Voutila wrote: >> >> Jason McIntyre writes: >> >>> On Wed, Feb 02, 2022 at 04:52:40PM +, Stuart Henderson wrote: This definitely wants testing on Ryzen ThinkPads (e.g. E485/E585/X395/T495s) or Inspiron 5505, I see user TSC disabled on a lot of those in dmesglog. >>> >>> hi. >>> >>> here are the results from a 5505. was the timecounter meant to switch >>> from tsc? >>> >>> jmc >>> >>> $ sysctl kern.timecounter >>> kern.timecounter.tick=1 >>> kern.timecounter.timestepwarnings=0 >>> kern.timecounter.hardware=i8254 >>> kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000) >>> >> >> I'm seeing the same issue...switching to i8254 pit where before it was >> using tsc. :( > > There are two separate related things, one is the kernel choice, and the > other is whether TSC can be used directly from userland for gettimeofday > and friends without a syscall. Does the dmesg without the diff say "user > TSC disabled"? If so then it was only using it in the kernel. > > From reading the diff, I do expect that tsc priority is dropped if the > measurements indicate problems, but I wonder why it falls back to i8254 > even though acpihpet/acpitimer are available and higher priority.. Because we drop the TSC quality after adding it as the active timecounter. This violates assumptions in kern_tc.c. If i8254 is added last, i8254 has a higher quality than the TSC and is made the active counter. The other counters don't factor in because the code assumed the active counter is the highest quality counter. ... which woulf be true if we weren't changing the quality after calling tc_init().
Re: amd64: simplify TSC sync testing
Thanks for testing. On 2022/02/02 13:51, Dave Voutila wrote: > > Jason McIntyre writes: > > > On Wed, Feb 02, 2022 at 04:52:40PM +, Stuart Henderson wrote: > >> This definitely wants testing on Ryzen ThinkPads (e.g. > >> E485/E585/X395/T495s) > >> or Inspiron 5505, I see user TSC disabled on a lot of those in dmesglog. > >> > >> > > > > hi. > > > > here are the results from a 5505. was the timecounter meant to switch > > from tsc? > > > > jmc > > > > $ sysctl kern.timecounter > > kern.timecounter.tick=1 > > kern.timecounter.timestepwarnings=0 > > kern.timecounter.hardware=i8254 > > kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000) > > > > I'm seeing the same issue...switching to i8254 pit where before it was > using tsc. :( There are two separate related things, one is the kernel choice, and the other is whether TSC can be used directly from userland for gettimeofday and friends without a syscall. Does the dmesg without the diff say "user TSC disabled"? If so then it was only using it in the kernel. >From reading the diff, I do expect that tsc priority is dropped if the measurements indicate problems, but I wonder why it falls back to i8254 even though acpihpet/acpitimer are available and higher priority.. > This is a Lenovo X13. dmesg and sysctl output follows. > > -dv > > $ sysctl kern.timecounter > kern.timecounter.tick=1 > kern.timecounter.timestepwarnings=0 > kern.timecounter.hardware=i8254 > kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000) > > > OpenBSD 7.0-current (CUSTOM.MP) #4: Wed Feb 2 13:24:56 EST 2022 > d...@kogelvis2.sisu.home:/usr/src/sys/arch/amd64/compile/CUSTOM.MP > real mem = 16301219840 (15546MB) > avail mem = 15664230400 (14938MB) > random: good seed from bootblocks > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0: SMBIOS rev. 3.2 @ 0xbf711000 (69 entries) > bios0: vendor LENOVO version "R1CET63W(1.32 )" date 04/09/2021 > bios0: LENOVO 20UF0013US > acpi0 at bios0: ACPI 6.3 > acpi0: sleep states S0 S3 S4 S5 > acpi0: tables DSDT FACP SSDT SSDT SSDT IVRS SSDT SSDT TPM2 SSDT MSDM BATB > HPET APIC MCFG SBST WSMT VFCT SSDT CRAT CDIT FPDT SSDT SSDT SSDT UEFI SSDT > SSDT BGRT > acpi0: wakeup devices GPP0(S3) RESA(S3) GPP4(S4) GPP5(S3) L850(S3) GPP6(S3) > GPP7(S3) GP17(S3) XHC0(S3) XHC1(S3) LID_(S4) SLPB(S3) > acpitimer0 at acpi0: 3579545 Hz, 32 bits > acpihpet0 at acpi0: 14318180 Hz > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: AMD Ryzen 5 PRO 4650U with Radeon Graphics, 2096.39 MHz, 17-60-01 > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES > cpu0: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB > 64b/line 8-way L2 cache > cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative > cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative > cpu0: smt 0, core 0, package 0 > mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges > cpu0: apic clock running at 99MHz > cpu0: mwait min=64, max=64, C-substates=1.1, IBE > cpu1 at mainbus0: apid 1 (application processor) > cpu1: AMD Ryzen 5 PRO 4650U with Radeon Graphics, 2096.06 MHz, 17-60-01 > cpu1: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES > cpu1: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB > 64b/line 8-way L2 cache > cpu1: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative > cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative > cpu1: smt 1, core 0, package 0 > cpu2 at mainbus0: apid 2 (application processor) > tsc: cpu0/cpu2 sync round 1: 1093 regressions > tsc: cpu0/cpu2 sync round 1: cpu0 lags cpu2 by 0 cycles > tsc: cpu0/cpu2 sync round 1: cpu2 lags cpu0 by 21 cycles > cpu2: AMD Ryzen 5 PRO 4650U with Radeon Graphics, 2096.06 MHz, 17-60-01 > cpu2: >
Re: amd64: simplify TSC sync testing
Jason McIntyre writes: > On Wed, Feb 02, 2022 at 04:52:40PM +, Stuart Henderson wrote: >> This definitely wants testing on Ryzen ThinkPads (e.g. E485/E585/X395/T495s) >> or Inspiron 5505, I see user TSC disabled on a lot of those in dmesglog. >> >> > > hi. > > here are the results from a 5505. was the timecounter meant to switch > from tsc? > > jmc > > $ sysctl kern.timecounter > kern.timecounter.tick=1 > kern.timecounter.timestepwarnings=0 > kern.timecounter.hardware=i8254 > kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000) > I'm seeing the same issue...switching to i8254 pit where before it was using tsc. :( This is a Lenovo X13. dmesg and sysctl output follows. -dv $ sysctl kern.timecounter kern.timecounter.tick=1 kern.timecounter.timestepwarnings=0 kern.timecounter.hardware=i8254 kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000) OpenBSD 7.0-current (CUSTOM.MP) #4: Wed Feb 2 13:24:56 EST 2022 d...@kogelvis2.sisu.home:/usr/src/sys/arch/amd64/compile/CUSTOM.MP real mem = 16301219840 (15546MB) avail mem = 15664230400 (14938MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 3.2 @ 0xbf711000 (69 entries) bios0: vendor LENOVO version "R1CET63W(1.32 )" date 04/09/2021 bios0: LENOVO 20UF0013US acpi0 at bios0: ACPI 6.3 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP SSDT SSDT SSDT IVRS SSDT SSDT TPM2 SSDT MSDM BATB HPET APIC MCFG SBST WSMT VFCT SSDT CRAT CDIT FPDT SSDT SSDT SSDT UEFI SSDT SSDT BGRT acpi0: wakeup devices GPP0(S3) RESA(S3) GPP4(S4) GPP5(S3) L850(S3) GPP6(S3) GPP7(S3) GP17(S3) XHC0(S3) XHC1(S3) LID_(S4) SLPB(S3) acpitimer0 at acpi0: 3579545 Hz, 32 bits acpihpet0 at acpi0: 14318180 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: AMD Ryzen 5 PRO 4650U with Radeon Graphics, 2096.39 MHz, 17-60-01 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES cpu0: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 8-way L2 cache cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, C-substates=1.1, IBE cpu1 at mainbus0: apid 1 (application processor) cpu1: AMD Ryzen 5 PRO 4650U with Radeon Graphics, 2096.06 MHz, 17-60-01 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES cpu1: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 8-way L2 cache cpu1: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative cpu1: smt 1, core 0, package 0 cpu2 at mainbus0: apid 2 (application processor) tsc: cpu0/cpu2 sync round 1: 1093 regressions tsc: cpu0/cpu2 sync round 1: cpu0 lags cpu2 by 0 cycles tsc: cpu0/cpu2 sync round 1: cpu2 lags cpu0 by 21 cycles cpu2: AMD Ryzen 5 PRO 4650U with Radeon Graphics, 2096.06 MHz, 17-60-01 cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES cpu2: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 8-way L2 cache cpu2: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative cpu2: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative cpu2: smt 0, core 1, package 0 cpu3 at mainbus0: apid 3 (application processor) cpu3: AMD Ryzen 5 PRO 4650U with Radeon Graphics, 2096.06 MHz, 17-60-01 cpu3:
Re: amd64: simplify TSC sync testing
On Wed, Feb 02, 2022 at 04:52:40PM +, Stuart Henderson wrote: > This definitely wants testing on Ryzen ThinkPads (e.g. E485/E585/X395/T495s) > or Inspiron 5505, I see user TSC disabled on a lot of those in dmesglog. > > hi. here are the results from a 5505. was the timecounter meant to switch from tsc? jmc $ sysctl kern.timecounter kern.timecounter.tick=1 kern.timecounter.timestepwarnings=0 kern.timecounter.hardware=i8254 kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000) --- dmesg.old Wed Feb 2 17:44:00 2022 +++ dmesg.new Wed Feb 2 17:49:19 2022 @@ -1,7 +1,7 @@ -OpenBSD 7.0-current (GENERIC.MP) #20: Wed Feb 2 08:06:29 GMT 2022 +OpenBSD 7.0-current (GENERIC.MP) #21: Wed Feb 2 17:47:27 GMT 2022 j...@manila.kerhand.co.uk:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 7894605824 (7528MB) -avail mem = 7638126592 (7284MB) +avail mem = 7638147072 (7284MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets @@ -18,7 +18,7 @@ acpihpet0 at acpi0: 14318180 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) -cpu0: AMD Ryzen 7 4700U with Radeon Graphics, 1996.50 MHz, 17-60-01 +cpu0: AMD Ryzen 7 4700U with Radeon Graphics, 1996.54 MHz, 17-60-01 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES cpu0: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 8-way L2 cache cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative @@ -28,6 +28,9 @@ cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, C-substates=1.1, IBE cpu1 at mainbus0: apid 1 (application processor) +tsc: cpu0/cpu1 sync round 1: 517 regressions +tsc: cpu0/cpu1 sync round 1: cpu0 lags cpu1 by 0 cycles +tsc: cpu0/cpu1 sync round 1: cpu1 lags cpu0 by 20 cycles cpu1: AMD Ryzen 7 4700U with Radeon Graphics, 1996.25 MHz, 17-60-01 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES cpu1: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 8-way L2 cache @@ -35,6 +38,9 @@ cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 2 (application processor) +tsc: cpu0/cpu2 sync round 1: 1080 regressions +tsc: cpu0/cpu2 sync round 1: cpu0 lags cpu2 by 0 cycles +tsc: cpu0/cpu2 sync round 1: cpu2 lags cpu0 by 20 cycles cpu2: AMD Ryzen 7 4700U with Radeon Graphics, 1996.25 MHz, 17-60-01 cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES cpu2: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 8-way L2 cache @@ -42,6 +48,9 @@ cpu2: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative cpu2: smt 0, core 2, package 0 cpu3 at mainbus0: apid 3 (application processor) +tsc: cpu0/cpu3 sync round 1: 1188 regressions +tsc: cpu0/cpu3 sync round 1: cpu0 lags cpu3 by 0 cycles +tsc: cpu0/cpu3 sync round 1: cpu3 lags cpu0 by 20 cycles cpu3: AMD Ryzen 7 4700U with Radeon Graphics, 1996.25 MHz, 17-60-01 cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES cpu3: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 8-way L2 cache @@ -49,22 +58,29 @@ cpu3: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative cpu3: smt 0, core 3, package 0 cpu4 at mainbus0: apid 4 (application
Re: amd64: simplify TSC sync testing
This definitely wants testing on Ryzen ThinkPads (e.g. E485/E585/X395/T495s) or Inspiron 5505, I see user TSC disabled on a lot of those in dmesglog. On 2022/01/27 10:28, Scott Cheloha wrote: > Hi, > > sthen@ complained recently about a multisocket system not being able > to use the TSC in userspace because the sync test measured too much > skew and disabled it. > > I don't think there is any real skew on that system. I think the sync > test is confusing NUMA overhead for skew and issuing a false positive > result. > > Now, we _could_ change the test to account for NUMA overhead. I don't > know exactly how you would do it, but I imagine you could devise a > handshake to compute an estimate and then factor that into your skew > measurement. > > Another approach is to drop the current skew measurement handshake in > favor of a dumber approach without that particular false positive case. > > This patch changes our sync test to a dumber approach. Instead of > trying to measure and correct for skew we only test for lag and do > not attempt to correct for it if we detect it. > > Two CPUs enter a loop and continuously check whether the other CPU's > TSC lags their own. With this approach the false positive we are > seeing on sthen@'s machine is impossible because we are only checking > whether one value lags the other, not whether their difference exceeds > an arbitrary value. Synchronization is tested to within a margin of > error because both CPUs are checking for lag at the same time. > > To keep the margin of error is as small as possible for a given clock > rate we spin for a relatively long time. Right now we spin for 1 > millisecond per test round. This is arbitrary but it seems to work. > There is a point of diminishing returns for round duration. This > sync test approach takes much more time than the current handshake > approach and I'd like to minimize our impact on boot time. > > To actually shrink the margin of error you need to run the CPUs at the > highest possible clock rate. If they are underclocked due to e.g. > SpeedStep your margin of error grows and the test may fail to detect > lag. > > We do two rounds of testing for each CPU. This is arbitrary. You > could do more. I think at least one additional test round is a good > idea, just in case we "get lucky" in the first round. I think this > could help mitigate measurement problems introduced by factors beyond > our control. For example, if one of the CPUs blacks out for the > duration of the test because it is preempted by the hypervisor the > test will pass but the result is bogus. If we do more test rounds we > have a better chance of getting a meaningful result even if we get > unlucky with hypervisor preemption or something similar. > > Misc. notes: > > - We no longer have a per-CPU skew value. If any TSC lags another we > just set the timecounter quality to a negative number. We don't > disable userspace or attempt to correct the skew in the kernel. > > I think that bears repeating: we are no longer correcting the skew. > If we detect any lag the TSC is effectively broken. > > - There is no longer any concept of TSC drift. Drift is computed > from skew change and we have no skew value anymore. > > - Am I allowed to printf(9) from a secondary CPU during boot? It > seems to hang the kernel but I don't know why. > > - I have no idea how to use the volatile keyword correctly. I am > trying to keep the compiler from optimizing away my stores. I don't > think it implicitly understands that two threads are looking at these > variables at the same time > > If I am abusing it please tell me. I'm trying to avoid a situation > where some later compiler change subtly breaks the test. If I > sprinkle volatile everywhere my impression is that it forces the > compiler to actually do the store. > > - I have aligned the global TSC values to 64 bytes to try to minimize > "cache bounce". Each value has one reader and one writer so if the > two values are on different cache lines a write to one value shouldn't > cause a cache miss for the writer of the other value. > > ... right? > > I'm not an expert on cache stuff. Can someone shed light on whether > I am doing the right thing here? > > - I rolled my own thread barriers for the test. Would a generic > thread barrier be useful elsewhere in the kernel? Feels wrong to roll > my own synchronization primitive, but maybe we actually don't need > them anywhere else? > > - I would like to forcibly reset IA32_TSC_ADJUST before running the > test but I don't think the CPU feature flags are set early enough > for tsc_reset_adjust() to see the relevant flag even if the CPU > has that register. > > Could we initialize the flags earlier in boot, before the sync test? > > Testing notes: > > - Tests on multisocket systems, multiprocessor VMs on various hypervisors, > and on systems where the TSC is currently