Re: amd64: simplify TSC sync testing

2022-02-02 Thread Mohamed Aslan
Hello,

I can confirm the same behaviour with this patch applied.

$ sysctl kern.timecounter
kern.timecounter.tick=1
kern.timecounter.timestepwarnings=0
kern.timecounter.hardware=i8254
kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000)

$ sysctl hw
hw.machine=amd64
hw.model=AMD Ryzen 5 3500U with Radeon Vega Mobile Gfx
hw.ncpu=8
hw.vendor=LENOVO
hw.version=ThinkPad E495

Regards,
Aslan

On Wed, Feb 02, 2022 at 01:51:18PM -0500, Dave Voutila wrote:
> 
> Jason McIntyre  writes:
> 
> > On Wed, Feb 02, 2022 at 04:52:40PM +, Stuart Henderson wrote:
> >> This definitely wants testing on Ryzen ThinkPads (e.g. 
> >> E485/E585/X395/T495s)
> >> or Inspiron 5505, I see user TSC disabled on a lot of those in dmesglog.
> >>
> >>
> >
> > hi.
> >
> > here are the results from a 5505. was the timecounter meant to switch
> > from tsc?
> >
> > jmc
> >
> > $ sysctl kern.timecounter
> > kern.timecounter.tick=1
> > kern.timecounter.timestepwarnings=0
> > kern.timecounter.hardware=i8254
> > kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000)
> >
> 
> I'm seeing the same issue...switching to i8254 pit where before it was
> using tsc. :(
> 
> This is a Lenovo X13. dmesg and sysctl output follows.
> 
> -dv
> 
> $ sysctl kern.timecounter
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=i8254
> kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000)
> 
> 
> OpenBSD 7.0-current (CUSTOM.MP) #4: Wed Feb  2 13:24:56 EST 2022
> d...@kogelvis2.sisu.home:/usr/src/sys/arch/amd64/compile/CUSTOM.MP
> real mem = 16301219840 (15546MB)
> avail mem = 15664230400 (14938MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.2 @ 0xbf711000 (69 entries)
> bios0: vendor LENOVO version "R1CET63W(1.32 )" date 04/09/2021
> bios0: LENOVO 20UF0013US
> acpi0 at bios0: ACPI 6.3
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT SSDT SSDT IVRS SSDT SSDT TPM2 SSDT MSDM BATB 
> HPET APIC MCFG SBST WSMT VFCT SSDT CRAT CDIT FPDT SSDT SSDT SSDT UEFI SSDT 
> SSDT BGRT
> acpi0: wakeup devices GPP0(S3) RESA(S3) GPP4(S4) GPP5(S3) L850(S3) GPP6(S3) 
> GPP7(S3) GP17(S3) XHC0(S3) XHC1(S3) LID_(S4) SLPB(S3)
> acpitimer0 at acpi0: 3579545 Hz, 32 bits
> acpihpet0 at acpi0: 14318180 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: AMD Ryzen 5 PRO 4650U with Radeon Graphics, 2096.39 MHz, 17-60-01
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 
> 64b/line 8-way L2 cache
> cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=1.1, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: AMD Ryzen 5 PRO 4650U with Radeon Graphics, 2096.06 MHz, 17-60-01
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu1: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 
> 64b/line 8-way L2 cache
> cpu1: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu1: smt 1, core 0, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> tsc: cpu0/cpu2 sync round 1: 1093 regressions
> tsc: cpu0/cpu2 sync round 1: cpu0 lags cpu2 by 0 cycles
> tsc: cpu0/cpu2 sync round 1: cpu2 lags cpu0 by 21 cycles
> cpu2: AMD Ryzen 5 PRO 4650U with Radeon Graphics, 2096.06 MHz, 17-60-01
> cpu2: 
> 

Re: amd64: simplify TSC sync testing

2022-02-02 Thread Dave Voutila


Stuart Henderson  writes:

> Thanks for testing.
>
> On 2022/02/02 13:51, Dave Voutila wrote:
>>
>> Jason McIntyre  writes:
>>
>> > On Wed, Feb 02, 2022 at 04:52:40PM +, Stuart Henderson wrote:
>> >> This definitely wants testing on Ryzen ThinkPads (e.g. 
>> >> E485/E585/X395/T495s)
>> >> or Inspiron 5505, I see user TSC disabled on a lot of those in dmesglog.
>> >>
>> >>
>> >
>> > hi.
>> >
>> > here are the results from a 5505. was the timecounter meant to switch
>> > from tsc?
>> >
>> > jmc
>> >
>> > $ sysctl kern.timecounter
>> > kern.timecounter.tick=1
>> > kern.timecounter.timestepwarnings=0
>> > kern.timecounter.hardware=i8254
>> > kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) 
>> > acpitimer0(1000)
>> >
>>
>> I'm seeing the same issue...switching to i8254 pit where before it was
>> using tsc. :(
>
> There are two separate related things, one is the kernel choice, and the
> other is whether TSC can be used directly from userland for gettimeofday
> and friends without a syscall. Does the dmesg without the diff say "user
> TSC disabled"? If so then it was only using it in the kernel.
>

Yup. I believe this Ryzen laptop has always had userland TSC disabled. I
typically see:

  cpu6: disabling user TSC (skew=-105)

-dv



Re: amd64: simplify TSC sync testing

2022-02-02 Thread Scott Cheloha
> On Feb 2, 2022, at 13:29, Stuart Henderson  wrote:
> 
> Thanks for testing.
> 
>> On 2022/02/02 13:51, Dave Voutila wrote:
>> 
>> Jason McIntyre  writes:
>> 
>>> On Wed, Feb 02, 2022 at 04:52:40PM +, Stuart Henderson wrote:
 This definitely wants testing on Ryzen ThinkPads (e.g. 
 E485/E585/X395/T495s)
 or Inspiron 5505, I see user TSC disabled on a lot of those in dmesglog.
 
 
>>> 
>>> hi.
>>> 
>>> here are the results from a 5505. was the timecounter meant to switch
>>> from tsc?
>>> 
>>> jmc
>>> 
>>> $ sysctl kern.timecounter
>>> kern.timecounter.tick=1
>>> kern.timecounter.timestepwarnings=0
>>> kern.timecounter.hardware=i8254
>>> kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000)
>>> 
>> 
>> I'm seeing the same issue...switching to i8254 pit where before it was
>> using tsc. :(
> 
> There are two separate related things, one is the kernel choice, and the
> other is whether TSC can be used directly from userland for gettimeofday
> and friends without a syscall. Does the dmesg without the diff say "user
> TSC disabled"? If so then it was only using it in the kernel.
> 
> From reading the diff, I do expect that tsc priority is dropped if the
> measurements indicate problems, but I wonder why it falls back to i8254
> even though acpihpet/acpitimer are available and higher priority..

Because we drop the TSC quality after
adding it as the active timecounter.

This violates assumptions in kern_tc.c.

If i8254 is added last, i8254 has a higher
quality than the TSC and is made the active
counter.  The other counters don't factor
in because the code assumed the active
counter is the highest quality counter.

... which woulf be true if we weren't changing
the quality after calling tc_init().



Re: amd64: simplify TSC sync testing

2022-02-02 Thread Stuart Henderson
Thanks for testing.

On 2022/02/02 13:51, Dave Voutila wrote:
> 
> Jason McIntyre  writes:
> 
> > On Wed, Feb 02, 2022 at 04:52:40PM +, Stuart Henderson wrote:
> >> This definitely wants testing on Ryzen ThinkPads (e.g. 
> >> E485/E585/X395/T495s)
> >> or Inspiron 5505, I see user TSC disabled on a lot of those in dmesglog.
> >>
> >>
> >
> > hi.
> >
> > here are the results from a 5505. was the timecounter meant to switch
> > from tsc?
> >
> > jmc
> >
> > $ sysctl kern.timecounter
> > kern.timecounter.tick=1
> > kern.timecounter.timestepwarnings=0
> > kern.timecounter.hardware=i8254
> > kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000)
> >
> 
> I'm seeing the same issue...switching to i8254 pit where before it was
> using tsc. :(

There are two separate related things, one is the kernel choice, and the
other is whether TSC can be used directly from userland for gettimeofday
and friends without a syscall. Does the dmesg without the diff say "user
TSC disabled"? If so then it was only using it in the kernel.

>From reading the diff, I do expect that tsc priority is dropped if the
measurements indicate problems, but I wonder why it falls back to i8254
even though acpihpet/acpitimer are available and higher priority..

> This is a Lenovo X13. dmesg and sysctl output follows.
> 
> -dv
> 
> $ sysctl kern.timecounter
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=i8254
> kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000)

> 
> 
> OpenBSD 7.0-current (CUSTOM.MP) #4: Wed Feb  2 13:24:56 EST 2022
> d...@kogelvis2.sisu.home:/usr/src/sys/arch/amd64/compile/CUSTOM.MP
> real mem = 16301219840 (15546MB)
> avail mem = 15664230400 (14938MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.2 @ 0xbf711000 (69 entries)
> bios0: vendor LENOVO version "R1CET63W(1.32 )" date 04/09/2021
> bios0: LENOVO 20UF0013US
> acpi0 at bios0: ACPI 6.3
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT SSDT SSDT IVRS SSDT SSDT TPM2 SSDT MSDM BATB 
> HPET APIC MCFG SBST WSMT VFCT SSDT CRAT CDIT FPDT SSDT SSDT SSDT UEFI SSDT 
> SSDT BGRT
> acpi0: wakeup devices GPP0(S3) RESA(S3) GPP4(S4) GPP5(S3) L850(S3) GPP6(S3) 
> GPP7(S3) GP17(S3) XHC0(S3) XHC1(S3) LID_(S4) SLPB(S3)
> acpitimer0 at acpi0: 3579545 Hz, 32 bits
> acpihpet0 at acpi0: 14318180 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: AMD Ryzen 5 PRO 4650U with Radeon Graphics, 2096.39 MHz, 17-60-01
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 
> 64b/line 8-way L2 cache
> cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=1.1, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: AMD Ryzen 5 PRO 4650U with Radeon Graphics, 2096.06 MHz, 17-60-01
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu1: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 
> 64b/line 8-way L2 cache
> cpu1: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu1: smt 1, core 0, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> tsc: cpu0/cpu2 sync round 1: 1093 regressions
> tsc: cpu0/cpu2 sync round 1: cpu0 lags cpu2 by 0 cycles
> tsc: cpu0/cpu2 sync round 1: cpu2 lags cpu0 by 21 cycles
> cpu2: AMD Ryzen 5 PRO 4650U with Radeon Graphics, 2096.06 MHz, 17-60-01
> cpu2: 
> 

Re: amd64: simplify TSC sync testing

2022-02-02 Thread Dave Voutila


Jason McIntyre  writes:

> On Wed, Feb 02, 2022 at 04:52:40PM +, Stuart Henderson wrote:
>> This definitely wants testing on Ryzen ThinkPads (e.g. E485/E585/X395/T495s)
>> or Inspiron 5505, I see user TSC disabled on a lot of those in dmesglog.
>>
>>
>
> hi.
>
> here are the results from a 5505. was the timecounter meant to switch
> from tsc?
>
> jmc
>
> $ sysctl kern.timecounter
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=i8254
> kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000)
>

I'm seeing the same issue...switching to i8254 pit where before it was
using tsc. :(

This is a Lenovo X13. dmesg and sysctl output follows.

-dv

$ sysctl kern.timecounter
kern.timecounter.tick=1
kern.timecounter.timestepwarnings=0
kern.timecounter.hardware=i8254
kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000)


OpenBSD 7.0-current (CUSTOM.MP) #4: Wed Feb  2 13:24:56 EST 2022
d...@kogelvis2.sisu.home:/usr/src/sys/arch/amd64/compile/CUSTOM.MP
real mem = 16301219840 (15546MB)
avail mem = 15664230400 (14938MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.2 @ 0xbf711000 (69 entries)
bios0: vendor LENOVO version "R1CET63W(1.32 )" date 04/09/2021
bios0: LENOVO 20UF0013US
acpi0 at bios0: ACPI 6.3
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT SSDT SSDT IVRS SSDT SSDT TPM2 SSDT MSDM BATB HPET 
APIC MCFG SBST WSMT VFCT SSDT CRAT CDIT FPDT SSDT SSDT SSDT UEFI SSDT SSDT BGRT
acpi0: wakeup devices GPP0(S3) RESA(S3) GPP4(S4) GPP5(S3) L850(S3) GPP6(S3) 
GPP7(S3) GP17(S3) XHC0(S3) XHC1(S3) LID_(S4) SLPB(S3)
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpihpet0 at acpi0: 14318180 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Ryzen 5 PRO 4650U with Radeon Graphics, 2096.39 MHz, 17-60-01
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache
cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=1.1, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: AMD Ryzen 5 PRO 4650U with Radeon Graphics, 2096.06 MHz, 17-60-01
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache
cpu1: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu1: smt 1, core 0, package 0
cpu2 at mainbus0: apid 2 (application processor)
tsc: cpu0/cpu2 sync round 1: 1093 regressions
tsc: cpu0/cpu2 sync round 1: cpu0 lags cpu2 by 0 cycles
tsc: cpu0/cpu2 sync round 1: cpu2 lags cpu0 by 21 cycles
cpu2: AMD Ryzen 5 PRO 4650U with Radeon Graphics, 2096.06 MHz, 17-60-01
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu2: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache
cpu2: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu2: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: AMD Ryzen 5 PRO 4650U with Radeon Graphics, 2096.06 MHz, 17-60-01
cpu3: 

Re: amd64: simplify TSC sync testing

2022-02-02 Thread Jason McIntyre
On Wed, Feb 02, 2022 at 04:52:40PM +, Stuart Henderson wrote:
> This definitely wants testing on Ryzen ThinkPads (e.g. E485/E585/X395/T495s)
> or Inspiron 5505, I see user TSC disabled on a lot of those in dmesglog.
> 
> 

hi.

here are the results from a 5505. was the timecounter meant to switch
from tsc?

jmc

$ sysctl kern.timecounter
kern.timecounter.tick=1
kern.timecounter.timestepwarnings=0
kern.timecounter.hardware=i8254
kern.timecounter.choice=i8254(0) tsc(-1000) acpihpet0(1000) acpitimer0(1000)

--- dmesg.old   Wed Feb  2 17:44:00 2022
+++ dmesg.new   Wed Feb  2 17:49:19 2022
@@ -1,7 +1,7 @@
-OpenBSD 7.0-current (GENERIC.MP) #20: Wed Feb  2 08:06:29 GMT 2022
+OpenBSD 7.0-current (GENERIC.MP) #21: Wed Feb  2 17:47:27 GMT 2022
 j...@manila.kerhand.co.uk:/usr/src/sys/arch/amd64/compile/GENERIC.MP
 real mem = 7894605824 (7528MB)
-avail mem = 7638126592 (7284MB)
+avail mem = 7638147072 (7284MB)
 random: good seed from bootblocks
 mpath0 at root
 scsibus0 at mpath0: 256 targets
@@ -18,7 +18,7 @@
 acpihpet0 at acpi0: 14318180 Hz
 acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
 cpu0 at mainbus0: apid 0 (boot processor)
-cpu0: AMD Ryzen 7 4700U with Radeon Graphics, 1996.50 MHz, 17-60-01
+cpu0: AMD Ryzen 7 4700U with Radeon Graphics, 1996.54 MHz, 17-60-01
 cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
 cpu0: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache
 cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
@@ -28,6 +28,9 @@
 cpu0: apic clock running at 99MHz
 cpu0: mwait min=64, max=64, C-substates=1.1, IBE
 cpu1 at mainbus0: apid 1 (application processor)
+tsc: cpu0/cpu1 sync round 1: 517 regressions
+tsc: cpu0/cpu1 sync round 1: cpu0 lags cpu1 by 0 cycles
+tsc: cpu0/cpu1 sync round 1: cpu1 lags cpu0 by 20 cycles
 cpu1: AMD Ryzen 7 4700U with Radeon Graphics, 1996.25 MHz, 17-60-01
 cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
 cpu1: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache
@@ -35,6 +38,9 @@
 cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
 cpu1: smt 0, core 1, package 0
 cpu2 at mainbus0: apid 2 (application processor)
+tsc: cpu0/cpu2 sync round 1: 1080 regressions
+tsc: cpu0/cpu2 sync round 1: cpu0 lags cpu2 by 0 cycles
+tsc: cpu0/cpu2 sync round 1: cpu2 lags cpu0 by 20 cycles
 cpu2: AMD Ryzen 7 4700U with Radeon Graphics, 1996.25 MHz, 17-60-01
 cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
 cpu2: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache
@@ -42,6 +48,9 @@
 cpu2: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
 cpu2: smt 0, core 2, package 0
 cpu3 at mainbus0: apid 3 (application processor)
+tsc: cpu0/cpu3 sync round 1: 1188 regressions
+tsc: cpu0/cpu3 sync round 1: cpu0 lags cpu3 by 0 cycles
+tsc: cpu0/cpu3 sync round 1: cpu3 lags cpu0 by 20 cycles
 cpu3: AMD Ryzen 7 4700U with Radeon Graphics, 1996.25 MHz, 17-60-01
 cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
 cpu3: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache
@@ -49,22 +58,29 @@
 cpu3: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
 cpu3: smt 0, core 3, package 0
 cpu4 at mainbus0: apid 4 (application 

Re: amd64: simplify TSC sync testing

2022-02-02 Thread Stuart Henderson
This definitely wants testing on Ryzen ThinkPads (e.g. E485/E585/X395/T495s)
or Inspiron 5505, I see user TSC disabled on a lot of those in dmesglog.


On 2022/01/27 10:28, Scott Cheloha wrote:
> Hi,
> 
> sthen@ complained recently about a multisocket system not being able
> to use the TSC in userspace because the sync test measured too much
> skew and disabled it.
> 
> I don't think there is any real skew on that system.  I think the sync
> test is confusing NUMA overhead for skew and issuing a false positive
> result.
> 
> Now, we _could_ change the test to account for NUMA overhead.  I don't
> know exactly how you would do it, but I imagine you could devise a
> handshake to compute an estimate and then factor that into your skew
> measurement.
> 
> Another approach is to drop the current skew measurement handshake in
> favor of a dumber approach without that particular false positive case.
> 
> This patch changes our sync test to a dumber approach.  Instead of
> trying to measure and correct for skew we only test for lag and do
> not attempt to correct for it if we detect it.
> 
> Two CPUs enter a loop and continuously check whether the other CPU's
> TSC lags their own.  With this approach the false positive we are
> seeing on sthen@'s machine is impossible because we are only checking
> whether one value lags the other, not whether their difference exceeds
> an arbitrary value.  Synchronization is tested to within a margin of
> error because both CPUs are checking for lag at the same time.
> 
> To keep the margin of error is as small as possible for a given clock
> rate we spin for a relatively long time.  Right now we spin for 1
> millisecond per test round.  This is arbitrary but it seems to work.
> There is a point of diminishing returns for round duration.  This
> sync test approach takes much more time than the current handshake
> approach and I'd like to minimize our impact on boot time.
> 
> To actually shrink the margin of error you need to run the CPUs at the
> highest possible clock rate.  If they are underclocked due to e.g.
> SpeedStep your margin of error grows and the test may fail to detect
> lag.
> 
> We do two rounds of testing for each CPU.  This is arbitrary.  You
> could do more.  I think at least one additional test round is a good
> idea, just in case we "get lucky" in the first round.  I think this
> could help mitigate measurement problems introduced by factors beyond
> our control.  For example, if one of the CPUs blacks out for the
> duration of the test because it is preempted by the hypervisor the
> test will pass but the result is bogus.  If we do more test rounds we
> have a better chance of getting a meaningful result even if we get
> unlucky with hypervisor preemption or something similar.
> 
> Misc. notes:
> 
> - We no longer have a per-CPU skew value.  If any TSC lags another we
>   just set the timecounter quality to a negative number.  We don't
>   disable userspace or attempt to correct the skew in the kernel.
> 
>   I think that bears repeating: we are no longer correcting the skew.
>   If we detect any lag the TSC is effectively broken.
> 
> - There is no longer any concept of TSC drift.  Drift is computed
>   from skew change and we have no skew value anymore.
> 
> - Am I allowed to printf(9) from a secondary CPU during boot?  It
>   seems to hang the kernel but I don't know why.
> 
> - I have no idea how to use the volatile keyword correctly.  I am
>   trying to keep the compiler from optimizing away my stores.  I don't
>   think it implicitly understands that two threads are looking at these
>   variables at the same time
> 
>   If I am abusing it please tell me.  I'm trying to avoid a situation
>   where some later compiler change subtly breaks the test.  If I
>   sprinkle volatile everywhere my impression is that it forces the
>   compiler to actually do the store.
> 
> - I have aligned the global TSC values to 64 bytes to try to minimize
>   "cache bounce".  Each value has one reader and one writer so if the
>   two values are on different cache lines a write to one value shouldn't
>   cause a cache miss for the writer of the other value.
> 
>   ... right?
> 
>   I'm not an expert on cache stuff.  Can someone shed light on whether
>   I am doing the right thing here?
> 
> - I rolled my own thread barriers for the test.  Would a generic
>   thread barrier be useful elsewhere in the kernel?  Feels wrong to roll
>   my own synchronization primitive, but maybe we actually don't need
>   them anywhere else?
> 
> - I would like to forcibly reset IA32_TSC_ADJUST before running the
>   test but I don't think the CPU feature flags are set early enough
>   for tsc_reset_adjust() to see the relevant flag even if the CPU
>   has that register.
> 
>   Could we initialize the flags earlier in boot, before the sync test?
> 
> Testing notes:
> 
> - Tests on multisocket systems, multiprocessor VMs on various hypervisors,
>   and on systems where the TSC is currently