Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2023-09-18 Thread Konstantin Belousov
On Mon, Sep 18, 2023 at 01:27:55PM -0500, Mike Karels wrote:
> On 18 Sep 2023, at 10:38, Michael Butler wrote:
> 
> > On 8/8/23 13:50, Michael Butler wrote:
> >> On 8/8/23 10:56, Tomoaki AOKI wrote:
> >>> On Tue, 8 Aug 2023 17:02:32 +0300
> >>> Konstantin Belousov  wrote:
> >>
> >>   [ .. snip .. ]
> >>
>  The workaround is switched on automatically, when kernel detects 'small 
>  cores'
>  reported by CPUID.
> >>>
> >>> If I read the code correctly, vm.pmap.pcid_invlpg_workaround
> >>> (precicely, the corresponding variable) is set to non-zero when the
> >>> workaround is enabled. Not sure it was detected correctly at the
> >>> original reporter's environment, but forcibly setting the tunable to 1
> >>> didn't reported to help sufficiently.
> >>> Currently, only setting tunable vm.pmap.pcid_enabled to 0 could help.
> >>
> >> I'm seeing similar stability problems on an N95-based device. This too is 
> >> an Alderlake-N device with only E-cores although I'm running it with a 
> >> compilation with CPUTYPE=tremont .. from an older, verbose start-up ..
> >>
> >> PPIM 0: PA=0x40, VA=0x8271, size=0x1d5000, mode=0x1
> >> pmap: large map 8 PML4 slots (4096 GB)
> >> VT(efifb): resolution 800x600
> >> Preloaded elf kernel "/boot/kernel.new/kernel" at 0x8234e000.
> >> Preloaded boot_entropy_cache "/boot/entropy" at 0x82357d08.
> >> Preloaded cpu_microcode "/boot/firmware/intel-ucode.bin" at 
> >> 0x82357d60.
> >> Preloaded hostuuid "/etc/hostid" at 0x82357dc0.
> >> Preloaded TSLOG data "TSLOG" at 0x82357e10.
> >> CPU: Intel(R) N95 (1689.60-MHz K8-class CPU)
> >>    Origin="GenuineIntel"  Id=0xb06e0  Family=0x6  Model=0xbe  Stepping=0
> >>
> >> Features=0xbfebfbff
> >>
> >> Features2=0x7ffafbbf
> >>    AMD Features=0x2c100800
> >>    AMD Features2=0x121
> >>    Structured Extended 
> >> Features=0x239ca7eb
> >>    Structured Extended 
> >> Features2=0x98c007bc
> >>    Structured Extended 
> >> Features3=0xfc184410
> >>    XSAVE Features=0xf
> >>    IA32_ARCH_CAPS=0x180fd6b
> >>    VT-x: Basic Features=0x3da0500
> >>      Pin-Based Controls=0xff
> >>      Primary Processor 
> >> Controls=0xfffbfffe
> >>      Secondary Processor 
> >> Controls=0x75d7fff
> >>      Exit Controls=0x3da0500
> >>      Entry Controls=0x3da0500
> >>      EPT Features=0x6f34141
> >>      VPID Features=0xf01
> >>    TSC: P-state invariant, performance statistics
> >> 64-Byte prefetching
> >> L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line
> >> real memory  = 17179869184 (16384 MB)
> >> Physical memory chunk(s):
> >> 0x0001 - 0x0009dfff, 581632 bytes (142 pages)
> >> 0x0009f000 - 0x0009, 4096 bytes (1 pages)
> >> 0x0010 - 0x5fff, 1609564160 bytes (392960 pages)
> >> 0x62401000 - 0x7264dfff, 270848000 bytes (66125 pages)
> >> 0x75fff000 - 0x75ff, 4096 bytes (1 pages)
> >> 0x00011000 - 0x000462497fff, 14533881856 bytes (3548311 pages)
> >> 0x00047fa0 - 0x00047fb68fff, 1478656 bytes (361 pages)
> >> avail memory = 16363008000 (15604 MB)
> >> CPU microcode: updated from 0xc to 0x10
> >
> > With the most recent microcode update, this device reports ..
> >
> > CPU microcode: updated from 0xc to 0x11
> >
> >  .. and is now stable with vm.pmap.pcid_enabled=0, 
> > vm.pmap.pcid_invlpg_workaround=1, and CPUTYPE?=alderlake set in 
> > /etc/make.conf over multiple full system builds.
> >
> > I have not tested with vm.pmap.pcid_invlpg_workaround=0.
> 
> I believe that vm.pmap.pcid_invlpg_workaround does not matter if
> vm.pmap.pcid_enabled=0.  Enabling the workaround or disabling pcid should
> be basically the same for this CPU, so I don't understand why that isn't
> true.
No, pcid != workaround.  The big is that INVPG instruction leaves some
entries in TLB which it should not, when PCID is enabled.  Workaround uses
a different instruction to ensure that no ghost entries left.

So it is possible that there is some other issue which makes similar
visible effect, with PCID enabled.

The problem is most likely because it is first Atom micro-arch where Intel
added PCID.

> It might be interesting to test with pcid enabled with the new
> microcode, although I don't see why that would affect the results (pcid
> should still not be used on any CPU).
Yes, testing a microcode with the fix for the original issue would be
interesting.  It is not nop, see above.

> 
> The CPUTYPE for the compiler should not affect the pcid vm issues, just
> change the optimization by the compiler.

True as well.



Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2023-09-18 Thread Michael Butler

On 9/18/23 14:27, Mike Karels wrote:

 [ .. snip .. ]


avail memory = 16363008000 (15604 MB)
CPU microcode: updated from 0xc to 0x10


With the most recent microcode update, this device reports ..

CPU microcode: updated from 0xc to 0x11

  .. and is now stable with vm.pmap.pcid_enabled=0, 
vm.pmap.pcid_invlpg_workaround=1, and CPUTYPE?=alderlake set in /etc/make.conf 
over multiple full system builds.

I have not tested with vm.pmap.pcid_invlpg_workaround=0.


 .. sorry that was a typo .. I'm actually using ..

vm.pmap.pcid_invlpg_workaround: 1
vm.pmap.invpcid_works: 1
vm.pmap.pcid_enabled: 1


I believe that vm.pmap.pcid_invlpg_workaround does not matter if
vm.pmap.pcid_enabled=0.  Enabling the workaround or disabling pcid should
be basically the same for this CPU, so I don't understand why that isn't
true.  It might be interesting to test with pcid enabled with the new
microcode, although I don't see why that would affect the results (pcid
should still not be used on any CPU).

The CPUTYPE for the compiler should not affect the pcid vm issues, just
change the optimization by the compiler.


Agreed. However, I was previously using CPUTYPE?=tremont so I just 
wanted to note that two things had changed in my testing, microcode and 
CPUTYPE, not just one,


Michael




Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2023-09-18 Thread Mike Karels
On 18 Sep 2023, at 10:38, Michael Butler wrote:

> On 8/8/23 13:50, Michael Butler wrote:
>> On 8/8/23 10:56, Tomoaki AOKI wrote:
>>> On Tue, 8 Aug 2023 17:02:32 +0300
>>> Konstantin Belousov  wrote:
>>
>>   [ .. snip .. ]
>>
 The workaround is switched on automatically, when kernel detects 'small 
 cores'
 reported by CPUID.
>>>
>>> If I read the code correctly, vm.pmap.pcid_invlpg_workaround
>>> (precicely, the corresponding variable) is set to non-zero when the
>>> workaround is enabled. Not sure it was detected correctly at the
>>> original reporter's environment, but forcibly setting the tunable to 1
>>> didn't reported to help sufficiently.
>>> Currently, only setting tunable vm.pmap.pcid_enabled to 0 could help.
>>
>> I'm seeing similar stability problems on an N95-based device. This too is an 
>> Alderlake-N device with only E-cores although I'm running it with a 
>> compilation with CPUTYPE=tremont .. from an older, verbose start-up ..
>>
>> PPIM 0: PA=0x40, VA=0x8271, size=0x1d5000, mode=0x1
>> pmap: large map 8 PML4 slots (4096 GB)
>> VT(efifb): resolution 800x600
>> Preloaded elf kernel "/boot/kernel.new/kernel" at 0x8234e000.
>> Preloaded boot_entropy_cache "/boot/entropy" at 0x82357d08.
>> Preloaded cpu_microcode "/boot/firmware/intel-ucode.bin" at 
>> 0x82357d60.
>> Preloaded hostuuid "/etc/hostid" at 0x82357dc0.
>> Preloaded TSLOG data "TSLOG" at 0x82357e10.
>> CPU: Intel(R) N95 (1689.60-MHz K8-class CPU)
>>    Origin="GenuineIntel"  Id=0xb06e0  Family=0x6  Model=0xbe  Stepping=0
>>
>> Features=0xbfebfbff
>>
>> Features2=0x7ffafbbf
>>    AMD Features=0x2c100800
>>    AMD Features2=0x121
>>    Structured Extended 
>> Features=0x239ca7eb
>>    Structured Extended 
>> Features2=0x98c007bc
>>    Structured Extended 
>> Features3=0xfc184410
>>    XSAVE Features=0xf
>>    IA32_ARCH_CAPS=0x180fd6b
>>    VT-x: Basic Features=0x3da0500
>>      Pin-Based Controls=0xff
>>      Primary Processor 
>> Controls=0xfffbfffe
>>      Secondary Processor 
>> Controls=0x75d7fff
>>      Exit Controls=0x3da0500
>>      Entry Controls=0x3da0500
>>      EPT Features=0x6f34141
>>      VPID Features=0xf01
>>    TSC: P-state invariant, performance statistics
>> 64-Byte prefetching
>> L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line
>> real memory  = 17179869184 (16384 MB)
>> Physical memory chunk(s):
>> 0x0001 - 0x0009dfff, 581632 bytes (142 pages)
>> 0x0009f000 - 0x0009, 4096 bytes (1 pages)
>> 0x0010 - 0x5fff, 1609564160 bytes (392960 pages)
>> 0x62401000 - 0x7264dfff, 270848000 bytes (66125 pages)
>> 0x75fff000 - 0x75ff, 4096 bytes (1 pages)
>> 0x00011000 - 0x000462497fff, 14533881856 bytes (3548311 pages)
>> 0x00047fa0 - 0x00047fb68fff, 1478656 bytes (361 pages)
>> avail memory = 16363008000 (15604 MB)
>> CPU microcode: updated from 0xc to 0x10
>
> With the most recent microcode update, this device reports ..
>
> CPU microcode: updated from 0xc to 0x11
>
>  .. and is now stable with vm.pmap.pcid_enabled=0, 
> vm.pmap.pcid_invlpg_workaround=1, and CPUTYPE?=alderlake set in 
> /etc/make.conf over multiple full system builds.
>
> I have not tested with vm.pmap.pcid_invlpg_workaround=0.

I believe that vm.pmap.pcid_invlpg_workaround does not matter if
vm.pmap.pcid_enabled=0.  Enabling the workaround or disabling pcid should
be basically the same for this CPU, so I don't understand why that isn't
true.  It might be interesting to test with pcid enabled with the new
microcode, although I don't see why that would affect the results (pcid
should still not be used on any CPU).

The CPUTYPE for the compiler should not affect the pcid vm issues, just
change the optimization by the compiler.

Mike

>> On start-up, vm.pmap.pcid_invlpg_workaround=1 but seemingly random faults 
>> still occurred under load, for example, 'make buildworld'. Apparent misreads 
>> of source-files resulting in syntax errors were the most common symptom. 
>> Compilation reattempts (mostly) succeed.
>>
>> Initially, I put this down to an inadequate power-supply but setting 
>> vm.pmap.pcid_enabled=0 seems to have stabilised it.
>>
>> I guess there's another dragon in there .. :-(
>>
>>  Michael



Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2023-09-18 Thread Michael Butler

On 8/8/23 13:50, Michael Butler wrote:

On 8/8/23 10:56, Tomoaki AOKI wrote:

On Tue, 8 Aug 2023 17:02:32 +0300
Konstantin Belousov  wrote:


  [ .. snip .. ]

The workaround is switched on automatically, when kernel detects 
'small cores'

reported by CPUID.


If I read the code correctly, vm.pmap.pcid_invlpg_workaround
(precicely, the corresponding variable) is set to non-zero when the
workaround is enabled. Not sure it was detected correctly at the
original reporter's environment, but forcibly setting the tunable to 1
didn't reported to help sufficiently.
Currently, only setting tunable vm.pmap.pcid_enabled to 0 could help.


I'm seeing similar stability problems on an N95-based device. This too 
is an Alderlake-N device with only E-cores although I'm running it with 
a compilation with CPUTYPE=tremont .. from an older, verbose start-up ..


PPIM 0: PA=0x40, VA=0x8271, size=0x1d5000, mode=0x1
pmap: large map 8 PML4 slots (4096 GB)
VT(efifb): resolution 800x600
Preloaded elf kernel "/boot/kernel.new/kernel" at 0x8234e000.
Preloaded boot_entropy_cache "/boot/entropy" at 0x82357d08.
Preloaded cpu_microcode "/boot/firmware/intel-ucode.bin" at 
0x82357d60.

Preloaded hostuuid "/etc/hostid" at 0x82357dc0.
Preloaded TSLOG data "TSLOG" at 0x82357e10.
CPU: Intel(R) N95 (1689.60-MHz K8-class CPU)
   Origin="GenuineIntel"  Id=0xb06e0  Family=0x6  Model=0xbe  Stepping=0

Features=0xbfebfbff

Features2=0x7ffafbbf
   AMD Features=0x2c100800
   AMD Features2=0x121
   Structured Extended 
Features=0x239ca7eb
   Structured Extended 
Features2=0x98c007bc
   Structured Extended 
Features3=0xfc184410

   XSAVE Features=0xf
   IA32_ARCH_CAPS=0x180fd6b
   VT-x: Basic Features=0x3da0500
     Pin-Based Controls=0xff
     Primary Processor 
Controls=0xfffbfffe
     Secondary Processor 
Controls=0x75d7fff

     Exit Controls=0x3da0500
     Entry Controls=0x3da0500
     EPT Features=0x6f34141
     VPID Features=0xf01
   TSC: P-state invariant, performance statistics
64-Byte prefetching
L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line
real memory  = 17179869184 (16384 MB)
Physical memory chunk(s):
0x0001 - 0x0009dfff, 581632 bytes (142 pages)
0x0009f000 - 0x0009, 4096 bytes (1 pages)
0x0010 - 0x5fff, 1609564160 bytes (392960 pages)
0x62401000 - 0x7264dfff, 270848000 bytes (66125 pages)
0x75fff000 - 0x75ff, 4096 bytes (1 pages)
0x00011000 - 0x000462497fff, 14533881856 bytes (3548311 pages)
0x00047fa0 - 0x00047fb68fff, 1478656 bytes (361 pages)
avail memory = 16363008000 (15604 MB)
CPU microcode: updated from 0xc to 0x10


With the most recent microcode update, this device reports ..

CPU microcode: updated from 0xc to 0x11

 .. and is now stable with vm.pmap.pcid_enabled=0, 
vm.pmap.pcid_invlpg_workaround=1, and CPUTYPE?=alderlake set in 
/etc/make.conf over multiple full system builds.


I have not tested with vm.pmap.pcid_invlpg_workaround=0.

On start-up, vm.pmap.pcid_invlpg_workaround=1 but seemingly random 
faults still occurred under load, for example, 'make buildworld'. 
Apparent misreads of source-files resulting in syntax errors were the 
most common symptom. Compilation reattempts (mostly) succeed.


Initially, I put this down to an inadequate power-supply but setting 
vm.pmap.pcid_enabled=0 seems to have stabilised it.


I guess there's another dragon in there .. :-(

 Michael





Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2023-08-12 Thread Kevin Oberman
On Tue, Aug 8, 2023 at 10:50 AM Michael Butler 
wrote:

> On 8/8/23 10:56, Tomoaki AOKI wrote:
> > On Tue, 8 Aug 2023 17:02:32 +0300
> > Konstantin Belousov  wrote:
>
>   [ .. snip .. ]
>
> >> The workaround is switched on automatically, when kernel detects 'small
> cores'
> >> reported by CPUID.
> >
> > If I read the code correctly, vm.pmap.pcid_invlpg_workaround
> > (precicely, the corresponding variable) is set to non-zero when the
> > workaround is enabled. Not sure it was detected correctly at the
> > original reporter's environment, but forcibly setting the tunable to 1
> > didn't reported to help sufficiently.
> > Currently, only setting tunable vm.pmap.pcid_enabled to 0 could help.
>
> I'm seeing similar stability problems on an N95-based device. This too
> is an Alderlake-N device with only E-cores although I'm running it with
> a compilation with CPUTYPE=tremont .. from an older, verbose start-up ..
>
> PPIM 0: PA=0x40, VA=0x8271, size=0x1d5000, mode=0x1
> pmap: large map 8 PML4 slots (4096 GB)
> VT(efifb): resolution 800x600
> Preloaded elf kernel "/boot/kernel.new/kernel" at 0x8234e000.
> Preloaded boot_entropy_cache "/boot/entropy" at 0x82357d08.
> Preloaded cpu_microcode "/boot/firmware/intel-ucode.bin" at
> 0x82357d60.
> Preloaded hostuuid "/etc/hostid" at 0x82357dc0.
> Preloaded TSLOG data "TSLOG" at 0x82357e10.
> CPU: Intel(R) N95 (1689.60-MHz K8-class CPU)
>Origin="GenuineIntel"  Id=0xb06e0  Family=0x6  Model=0xbe  Stepping=0
>
>
> Features=0xbfebfbff
>
>
> Features2=0x7ffafbbf
>AMD Features=0x2c100800
>AMD Features2=0x121
>Structured Extended
>
> Features=0x239ca7eb
>Structured Extended
>
> Features2=0x98c007bc
>Structured Extended
>
> Features3=0xfc184410
>XSAVE Features=0xf
>IA32_ARCH_CAPS=0x180fd6b
>VT-x: Basic Features=0x3da0500
>  Pin-Based Controls=0xff
>  Primary Processor
>
> Controls=0xfffbfffe
>  Secondary Processor
>
> Controls=0x75d7fff
>  Exit Controls=0x3da0500
>  Entry Controls=0x3da0500
>  EPT Features=0x6f34141
>  VPID Features=0xf01
>TSC: P-state invariant, performance statistics
> 64-Byte prefetching
> L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line
> real memory  = 17179869184 (16384 MB)
> Physical memory chunk(s):
> 0x0001 - 0x0009dfff, 581632 bytes (142 pages)
> 0x0009f000 - 0x0009, 4096 bytes (1 pages)
> 0x0010 - 0x5fff, 1609564160 bytes (392960 pages)
> 0x62401000 - 0x7264dfff, 270848000 bytes (66125 pages)
> 0x75fff000 - 0x75ff, 4096 bytes (1 pages)
> 0x00011000 - 0x000462497fff, 14533881856 bytes (3548311 pages)
> 0x00047fa0 - 0x00047fb68fff, 1478656 bytes (361 pages)
> avail memory = 16363008000 (15604 MB)
> CPU microcode: updated from 0xc to 0x10
> MADT: Found CPU APIC ID 0 ACPI ID 0: enabled
> SMP: Added CPU 0 (AP)
> MADT: Found CPU APIC ID 2 ACPI ID 1: enabled
> SMP: Added CPU 2 (AP)
> MADT: Found CPU APIC ID 4 ACPI ID 2: enabled
> SMP: Added CPU 4 (AP)
> MADT: Found CPU APIC ID 6 ACPI ID 3: enabled
> SMP: Added CPU 6 (AP)
>
> On start-up, vm.pmap.pcid_invlpg_workaround=1 but seemingly random
> faults still occurred under load, for example, 'make buildworld'.
> Apparent misreads of source-files resulting in syntax errors were the
> most common symptom. Compilation reattempts (mostly) succeed.
>
> Initially, I put this down to an inadequate power-supply but setting
> vm.pmap.pcid_enabled=0 seems to have stabilised it.
>
> I guess there's another dragon in there .. :-(
>
> Michae
>

Just to add another report (in the wrong mail list as it is also on a
system running 13.2), I have a very similar system from a different
manufacturer with the same Alder Lake processor. I will note that the SSD
interface is SATA, not nvme. I was getting crashes and corrupt file
systems, especially when installing large ports and using rsync to backup
the system. I see many, almost identical systems on Amazon that use the
same form factor CPU, SSD, RAM, etc, probably all using the same
motherboard from a single manufacturer. There are going to be more issues
as these boxes are generally <$225 US. (Mine was a bit more expensive to
get a VGA connector for my ancient monitor.

I had not tried the tuneable, but largely resolved the issue by installing
a 250 MB hard drive and putting the system there. In the couple of months
since I did this I have had two crashes, both when doing a full backup with
rsync. This leads me to think that there is some sort of race triggering
this that is minimized by the slow disc speed of spinning rust.

I am considering moving the system back to the SSD with
vm.pmap.pcid_enabled=0. If so, the failure should be very quick as I never
could keep the system up long enough to get the system into production.
-- 
Kevin Oberman, Part time kid herder and retired Network 

Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2023-08-08 Thread Michael Butler

On 8/8/23 10:56, Tomoaki AOKI wrote:

On Tue, 8 Aug 2023 17:02:32 +0300
Konstantin Belousov  wrote:


 [ .. snip .. ]


The workaround is switched on automatically, when kernel detects 'small cores'
reported by CPUID.


If I read the code correctly, vm.pmap.pcid_invlpg_workaround
(precicely, the corresponding variable) is set to non-zero when the
workaround is enabled. Not sure it was detected correctly at the
original reporter's environment, but forcibly setting the tunable to 1
didn't reported to help sufficiently.
Currently, only setting tunable vm.pmap.pcid_enabled to 0 could help.


I'm seeing similar stability problems on an N95-based device. This too 
is an Alderlake-N device with only E-cores although I'm running it with 
a compilation with CPUTYPE=tremont .. from an older, verbose start-up ..


PPIM 0: PA=0x40, VA=0x8271, size=0x1d5000, mode=0x1
pmap: large map 8 PML4 slots (4096 GB)
VT(efifb): resolution 800x600
Preloaded elf kernel "/boot/kernel.new/kernel" at 0x8234e000.
Preloaded boot_entropy_cache "/boot/entropy" at 0x82357d08.
Preloaded cpu_microcode "/boot/firmware/intel-ucode.bin" at 
0x82357d60.

Preloaded hostuuid "/etc/hostid" at 0x82357dc0.
Preloaded TSLOG data "TSLOG" at 0x82357e10.
CPU: Intel(R) N95 (1689.60-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0xb06e0  Family=0x6  Model=0xbe  Stepping=0

Features=0xbfebfbff

Features2=0x7ffafbbf
  AMD Features=0x2c100800
  AMD Features2=0x121
  Structured Extended 
Features=0x239ca7eb
  Structured Extended 
Features2=0x98c007bc
  Structured Extended 
Features3=0xfc184410

  XSAVE Features=0xf
  IA32_ARCH_CAPS=0x180fd6b
  VT-x: Basic Features=0x3da0500
Pin-Based Controls=0xff
Primary Processor 
Controls=0xfffbfffe
Secondary Processor 
Controls=0x75d7fff

Exit Controls=0x3da0500
Entry Controls=0x3da0500
EPT Features=0x6f34141
VPID Features=0xf01
  TSC: P-state invariant, performance statistics
64-Byte prefetching
L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line
real memory  = 17179869184 (16384 MB)
Physical memory chunk(s):
0x0001 - 0x0009dfff, 581632 bytes (142 pages)
0x0009f000 - 0x0009, 4096 bytes (1 pages)
0x0010 - 0x5fff, 1609564160 bytes (392960 pages)
0x62401000 - 0x7264dfff, 270848000 bytes (66125 pages)
0x75fff000 - 0x75ff, 4096 bytes (1 pages)
0x00011000 - 0x000462497fff, 14533881856 bytes (3548311 pages)
0x00047fa0 - 0x00047fb68fff, 1478656 bytes (361 pages)
avail memory = 16363008000 (15604 MB)
CPU microcode: updated from 0xc to 0x10
MADT: Found CPU APIC ID 0 ACPI ID 0: enabled
SMP: Added CPU 0 (AP)
MADT: Found CPU APIC ID 2 ACPI ID 1: enabled
SMP: Added CPU 2 (AP)
MADT: Found CPU APIC ID 4 ACPI ID 2: enabled
SMP: Added CPU 4 (AP)
MADT: Found CPU APIC ID 6 ACPI ID 3: enabled
SMP: Added CPU 6 (AP)

On start-up, vm.pmap.pcid_invlpg_workaround=1 but seemingly random 
faults still occurred under load, for example, 'make buildworld'. 
Apparent misreads of source-files resulting in syntax errors were the 
most common symptom. Compilation reattempts (mostly) succeed.


Initially, I put this down to an inadequate power-supply but setting 
vm.pmap.pcid_enabled=0 seems to have stabilised it.


I guess there's another dragon in there .. :-(

Michael







Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2023-08-08 Thread Tomoaki AOKI
On Tue, 8 Aug 2023 17:02:32 +0300
Konstantin Belousov  wrote:

> On Tue, Aug 08, 2023 at 10:46:12PM +0900, Tomoaki AOKI wrote:
> > On Tue, 8 Aug 2023 15:38:46 +0300
> > Konstantin Belousov  wrote:
> > 
> > > On Tue, Aug 08, 2023 at 06:37:35AM +0900, Tomoaki AOKI wrote:
> > > > On Sun, 6 Aug 2023 12:55:07 +0300
> > > > Konstantin Belousov  wrote:
> > > > 
> > > > > On Sun, Aug 06, 2023 at 06:12:38PM +0900, Tomoaki AOKI wrote:
> > > > > > On Wed, 23 Feb 2022 01:30:28 +0200
> > > > > > Konstantin Belousov  wrote:
> > > > > > 
> > > > > > > On Tue, Feb 22, 2022 at 06:23:17PM -0500, Alexander Motin wrote:
> > > > > > > > On 22.02.2022 17:46, Konstantin Belousov wrote:
> > > > > > > > > Ok, the next step is to get the CPU feature reports from P- 
> > > > > > > > > vs. E- cores.
> > > > > > > > > Patch below should work, with verbose boot.
> > > > > > > > 
> > > > > > > > Not much difference on that level:
> > > > > > > > 
> > > > > > > > --- zzzp2022-02-22 18:18:24.531704000 -0500
> > > > > > > > +++ zzze2022-02-22 18:18:18.631236000 -0500
> > > > > > > > @@ -1,22 +1,21 @@
> > > > > > > > -CPU 2: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz 
> > > > > > > > K8-class CPU)
> > > > > > > > +CPU 16: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz 
> > > > > > > > K8-class CPU)
> > > > > > > >Origin="GenuineIntel"  Id=0x90672  Family=0x6  Model=0x97  
> > > > > > > > Stepping=2
> > > > > > > > Features=0xbfebfbff
> > > > > > > > Features2=0x7ffafbff
> > > > > > > >AMD Features=0x2c100800
> > > > > > > >AMD Features2=0x121
> > > > > > > >Structured Extended 
> > > > > > > > Features=0x239ca7eb
> > > > > > > >Structured Extended 
> > > > > > > > Features2=0x98c027ac
> > > > > > > >Structured Extended 
> > > > > > > > Features3=0xfc1cc410
> > > > > > > >XSAVE Features=0xf
> > > > > > > >
> > > > > > > > IA32_ARCH_CAPS=0xd6b
> > > > > > > >VT-x: Basic Features=0x3da0500
> > > > > > > >  Pin-Based 
> > > > > > > > Controls=0xff
> > > > > > > >  Primary Processor 
> > > > > > > > Controls=0xfffbfffe
> > > > > > > >  Secondary Processor 
> > > > > > > > Controls=0xf5d7fff
> > > > > > > >  Exit Controls=0x3da0500
> > > > > > > >  Entry Controls=0x3da0500
> > > > > > > >  EPT 
> > > > > > > > Features=0x6f34141
> > > > > > > >  VPID 
> > > > > > > > Features=0x10f01
> > > > > > > >TSC: P-state invariant, performance statistics
> > > > > > > > -64-Byte prefetching
> > > > > > > > -L2 cache: 1280 kbytes, 8-way associative, 64 bytes/line
> > > > > > > > +L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line
> > > > > > > > 
> > > > > > > 
> > > > > > > Show me the full verbose dmesg of the boot then.
> > > > > > > 
> > > > > > > As another blind guess, try to disable pcid, 
> > > > > > > vm.pmap.pcid_enabled=0.
> > > > > > > 
> > > > > > 
> > > > > > Hi.
> > > > > > 
> > > > > > Intel N100 is reported to crash without this tunable on 13.2 at
> > > > > > freebsd-users-jp ML (as this is a ML in Japanese, reported in
> > > > > > Japanese). [1]
> > > > > > Crashes with UFS, but ZFS is claimed to be OK.
> > > > > > 
> > > > > > N100 is an Alder Lake-N processor WITHOUT P-CORE. [2] [3]
> > > > > > So check logics on workarouund codes (IIRC, all are MFC'ed before 
> > > > > > 13.2)
> > > > > > wouldn't be working?
> > > > > 
> > > > > Show me the output from x86info -r on the machine, I do not care which
> > > > > specific core it is, they should be all the same.  x86info is 
> > > > > available
> > > > > as sysutils/x86info.
> > > > 
> > > > Requested to original reporter and got the result below.
> > > > HTH.
> > > > 
> > > > ---
> > > > root@eq12:~ # x86info -r
> > > > x86info v1.31pre
> > > > /dev/cpuctl0: No such file or directory
> > > > Found 4 identical CPUs
> > > > Extended Family: 0 Extended Model: 11 Family: 6 Model: 190 Stepping: 0
> > > > Type: 0 (Original OEM)
> > > > CPU Model (x86info's best guess): Unknown model.
> > > ...
> > > > eax in: 0x001a, eax = 2001 ebx =  ecx =  edx = 
> > > > 
> > > 
> > > The CPU is reported as small core/atom, so the workaround is turned on.
> > > I do not think that the issue reported is related to the TLB/PG_G errata.
> > > 
> > > Why do you think that this is hw issue at all, and not some software bug
> > > in the build etc ?
> > 
> > Because the issue looks similar (crashes on UFS but not ZFS, and as far
> > as the original reporter tested, vm.pmap.pcid_enabled=0
> > in /boot/loader.conf helped).
> > 
> > Moreover, N100 CPU is Alder Lake-N. So potentially includes the same
> > design issue (common circuits, firmwares, ...).
> > 
> > So I suspected the same problem persists even without P-core and
> > adviced the original reporter to add the workaround
> > in /boot/loader.conf.
> > It seems to help until now.
> The workaround is switched on automatically, when kernel detects 'small cores'
> reported by CPUID.

If I 

Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2023-08-08 Thread Konstantin Belousov
On Tue, Aug 08, 2023 at 10:46:12PM +0900, Tomoaki AOKI wrote:
> On Tue, 8 Aug 2023 15:38:46 +0300
> Konstantin Belousov  wrote:
> 
> > On Tue, Aug 08, 2023 at 06:37:35AM +0900, Tomoaki AOKI wrote:
> > > On Sun, 6 Aug 2023 12:55:07 +0300
> > > Konstantin Belousov  wrote:
> > > 
> > > > On Sun, Aug 06, 2023 at 06:12:38PM +0900, Tomoaki AOKI wrote:
> > > > > On Wed, 23 Feb 2022 01:30:28 +0200
> > > > > Konstantin Belousov  wrote:
> > > > > 
> > > > > > On Tue, Feb 22, 2022 at 06:23:17PM -0500, Alexander Motin wrote:
> > > > > > > On 22.02.2022 17:46, Konstantin Belousov wrote:
> > > > > > > > Ok, the next step is to get the CPU feature reports from P- vs. 
> > > > > > > > E- cores.
> > > > > > > > Patch below should work, with verbose boot.
> > > > > > > 
> > > > > > > Not much difference on that level:
> > > > > > > 
> > > > > > > --- zzzp2022-02-22 18:18:24.531704000 -0500
> > > > > > > +++ zzze2022-02-22 18:18:18.631236000 -0500
> > > > > > > @@ -1,22 +1,21 @@
> > > > > > > -CPU 2: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz 
> > > > > > > K8-class CPU)
> > > > > > > +CPU 16: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz 
> > > > > > > K8-class CPU)
> > > > > > >Origin="GenuineIntel"  Id=0x90672  Family=0x6  Model=0x97  
> > > > > > > Stepping=2
> > > > > > > Features=0xbfebfbff
> > > > > > > Features2=0x7ffafbff
> > > > > > >AMD Features=0x2c100800
> > > > > > >AMD Features2=0x121
> > > > > > >Structured Extended 
> > > > > > > Features=0x239ca7eb
> > > > > > >Structured Extended 
> > > > > > > Features2=0x98c027ac
> > > > > > >Structured Extended 
> > > > > > > Features3=0xfc1cc410
> > > > > > >XSAVE Features=0xf
> > > > > > >
> > > > > > > IA32_ARCH_CAPS=0xd6b
> > > > > > >VT-x: Basic Features=0x3da0500
> > > > > > >  Pin-Based Controls=0xff
> > > > > > >  Primary Processor 
> > > > > > > Controls=0xfffbfffe
> > > > > > >  Secondary Processor 
> > > > > > > Controls=0xf5d7fff
> > > > > > >  Exit Controls=0x3da0500
> > > > > > >  Entry Controls=0x3da0500
> > > > > > >  EPT 
> > > > > > > Features=0x6f34141
> > > > > > >  VPID 
> > > > > > > Features=0x10f01
> > > > > > >TSC: P-state invariant, performance statistics
> > > > > > > -64-Byte prefetching
> > > > > > > -L2 cache: 1280 kbytes, 8-way associative, 64 bytes/line
> > > > > > > +L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line
> > > > > > > 
> > > > > > 
> > > > > > Show me the full verbose dmesg of the boot then.
> > > > > > 
> > > > > > As another blind guess, try to disable pcid, vm.pmap.pcid_enabled=0.
> > > > > > 
> > > > > 
> > > > > Hi.
> > > > > 
> > > > > Intel N100 is reported to crash without this tunable on 13.2 at
> > > > > freebsd-users-jp ML (as this is a ML in Japanese, reported in
> > > > > Japanese). [1]
> > > > > Crashes with UFS, but ZFS is claimed to be OK.
> > > > > 
> > > > > N100 is an Alder Lake-N processor WITHOUT P-CORE. [2] [3]
> > > > > So check logics on workarouund codes (IIRC, all are MFC'ed before 
> > > > > 13.2)
> > > > > wouldn't be working?
> > > > 
> > > > Show me the output from x86info -r on the machine, I do not care which
> > > > specific core it is, they should be all the same.  x86info is available
> > > > as sysutils/x86info.
> > > 
> > > Requested to original reporter and got the result below.
> > > HTH.
> > > 
> > > ---
> > > root@eq12:~ # x86info -r
> > > x86info v1.31pre
> > > /dev/cpuctl0: No such file or directory
> > > Found 4 identical CPUs
> > > Extended Family: 0 Extended Model: 11 Family: 6 Model: 190 Stepping: 0
> > > Type: 0 (Original OEM)
> > > CPU Model (x86info's best guess): Unknown model.
> > ...
> > > eax in: 0x001a, eax = 2001 ebx =  ecx =  edx = 
> > > 
> > 
> > The CPU is reported as small core/atom, so the workaround is turned on.
> > I do not think that the issue reported is related to the TLB/PG_G errata.
> > 
> > Why do you think that this is hw issue at all, and not some software bug
> > in the build etc ?
> 
> Because the issue looks similar (crashes on UFS but not ZFS, and as far
> as the original reporter tested, vm.pmap.pcid_enabled=0
> in /boot/loader.conf helped).
> 
> Moreover, N100 CPU is Alder Lake-N. So potentially includes the same
> design issue (common circuits, firmwares, ...).
> 
> So I suspected the same problem persists even without P-core and
> adviced the original reporter to add the workaround
> in /boot/loader.conf.
> It seems to help until now.
The workaround is switched on automatically, when kernel detects 'small cores'
reported by CPUID.



Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2023-08-08 Thread Tomoaki AOKI
On Tue, 8 Aug 2023 15:38:46 +0300
Konstantin Belousov  wrote:

> On Tue, Aug 08, 2023 at 06:37:35AM +0900, Tomoaki AOKI wrote:
> > On Sun, 6 Aug 2023 12:55:07 +0300
> > Konstantin Belousov  wrote:
> > 
> > > On Sun, Aug 06, 2023 at 06:12:38PM +0900, Tomoaki AOKI wrote:
> > > > On Wed, 23 Feb 2022 01:30:28 +0200
> > > > Konstantin Belousov  wrote:
> > > > 
> > > > > On Tue, Feb 22, 2022 at 06:23:17PM -0500, Alexander Motin wrote:
> > > > > > On 22.02.2022 17:46, Konstantin Belousov wrote:
> > > > > > > Ok, the next step is to get the CPU feature reports from P- vs. 
> > > > > > > E- cores.
> > > > > > > Patch below should work, with verbose boot.
> > > > > > 
> > > > > > Not much difference on that level:
> > > > > > 
> > > > > > --- zzzp2022-02-22 18:18:24.531704000 -0500
> > > > > > +++ zzze2022-02-22 18:18:18.631236000 -0500
> > > > > > @@ -1,22 +1,21 @@
> > > > > > -CPU 2: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz K8-class 
> > > > > > CPU)
> > > > > > +CPU 16: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz K8-class 
> > > > > > CPU)
> > > > > >Origin="GenuineIntel"  Id=0x90672  Family=0x6  Model=0x97  
> > > > > > Stepping=2
> > > > > > Features=0xbfebfbff
> > > > > > Features2=0x7ffafbff
> > > > > >AMD Features=0x2c100800
> > > > > >AMD Features2=0x121
> > > > > >Structured Extended 
> > > > > > Features=0x239ca7eb
> > > > > >Structured Extended 
> > > > > > Features2=0x98c027ac
> > > > > >Structured Extended 
> > > > > > Features3=0xfc1cc410
> > > > > >XSAVE Features=0xf
> > > > > >
> > > > > > IA32_ARCH_CAPS=0xd6b
> > > > > >VT-x: Basic Features=0x3da0500
> > > > > >  Pin-Based Controls=0xff
> > > > > >  Primary Processor 
> > > > > > Controls=0xfffbfffe
> > > > > >  Secondary Processor 
> > > > > > Controls=0xf5d7fff
> > > > > >  Exit Controls=0x3da0500
> > > > > >  Entry Controls=0x3da0500
> > > > > >  EPT 
> > > > > > Features=0x6f34141
> > > > > >  VPID 
> > > > > > Features=0x10f01
> > > > > >TSC: P-state invariant, performance statistics
> > > > > > -64-Byte prefetching
> > > > > > -L2 cache: 1280 kbytes, 8-way associative, 64 bytes/line
> > > > > > +L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line
> > > > > > 
> > > > > 
> > > > > Show me the full verbose dmesg of the boot then.
> > > > > 
> > > > > As another blind guess, try to disable pcid, vm.pmap.pcid_enabled=0.
> > > > > 
> > > > 
> > > > Hi.
> > > > 
> > > > Intel N100 is reported to crash without this tunable on 13.2 at
> > > > freebsd-users-jp ML (as this is a ML in Japanese, reported in
> > > > Japanese). [1]
> > > > Crashes with UFS, but ZFS is claimed to be OK.
> > > > 
> > > > N100 is an Alder Lake-N processor WITHOUT P-CORE. [2] [3]
> > > > So check logics on workarouund codes (IIRC, all are MFC'ed before 13.2)
> > > > wouldn't be working?
> > > 
> > > Show me the output from x86info -r on the machine, I do not care which
> > > specific core it is, they should be all the same.  x86info is available
> > > as sysutils/x86info.
> > 
> > Requested to original reporter and got the result below.
> > HTH.
> > 
> > ---
> > root@eq12:~ # x86info -r
> > x86info v1.31pre
> > /dev/cpuctl0: No such file or directory
> > Found 4 identical CPUs
> > Extended Family: 0 Extended Model: 11 Family: 6 Model: 190 Stepping: 0
> > Type: 0 (Original OEM)
> > CPU Model (x86info's best guess): Unknown model.
> ...
> > eax in: 0x001a, eax = 2001 ebx =  ecx =  edx = 
> > 
> 
> The CPU is reported as small core/atom, so the workaround is turned on.
> I do not think that the issue reported is related to the TLB/PG_G errata.
> 
> Why do you think that this is hw issue at all, and not some software bug
> in the build etc ?

Because the issue looks similar (crashes on UFS but not ZFS, and as far
as the original reporter tested, vm.pmap.pcid_enabled=0
in /boot/loader.conf helped).

Moreover, N100 CPU is Alder Lake-N. So potentially includes the same
design issue (common circuits, firmwares, ...).

So I suspected the same problem persists even without P-core and
adviced the original reporter to add the workaround
in /boot/loader.conf.
It seems to help until now.

-- 
Tomoaki AOKI



Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2023-08-08 Thread Konstantin Belousov
On Tue, Aug 08, 2023 at 06:37:35AM +0900, Tomoaki AOKI wrote:
> On Sun, 6 Aug 2023 12:55:07 +0300
> Konstantin Belousov  wrote:
> 
> > On Sun, Aug 06, 2023 at 06:12:38PM +0900, Tomoaki AOKI wrote:
> > > On Wed, 23 Feb 2022 01:30:28 +0200
> > > Konstantin Belousov  wrote:
> > > 
> > > > On Tue, Feb 22, 2022 at 06:23:17PM -0500, Alexander Motin wrote:
> > > > > On 22.02.2022 17:46, Konstantin Belousov wrote:
> > > > > > Ok, the next step is to get the CPU feature reports from P- vs. E- 
> > > > > > cores.
> > > > > > Patch below should work, with verbose boot.
> > > > > 
> > > > > Not much difference on that level:
> > > > > 
> > > > > --- zzzp2022-02-22 18:18:24.531704000 -0500
> > > > > +++ zzze2022-02-22 18:18:18.631236000 -0500
> > > > > @@ -1,22 +1,21 @@
> > > > > -CPU 2: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz K8-class 
> > > > > CPU)
> > > > > +CPU 16: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz K8-class 
> > > > > CPU)
> > > > >Origin="GenuineIntel"  Id=0x90672  Family=0x6  Model=0x97  
> > > > > Stepping=2
> > > > > Features=0xbfebfbff
> > > > > Features2=0x7ffafbff
> > > > >AMD Features=0x2c100800
> > > > >AMD Features2=0x121
> > > > >Structured Extended 
> > > > > Features=0x239ca7eb
> > > > >Structured Extended 
> > > > > Features2=0x98c027ac
> > > > >Structured Extended 
> > > > > Features3=0xfc1cc410
> > > > >XSAVE Features=0xf
> > > > >IA32_ARCH_CAPS=0xd6b
> > > > >VT-x: Basic Features=0x3da0500
> > > > >  Pin-Based Controls=0xff
> > > > >  Primary Processor 
> > > > > Controls=0xfffbfffe
> > > > >  Secondary Processor 
> > > > > Controls=0xf5d7fff
> > > > >  Exit Controls=0x3da0500
> > > > >  Entry Controls=0x3da0500
> > > > >  EPT 
> > > > > Features=0x6f34141
> > > > >  VPID 
> > > > > Features=0x10f01
> > > > >TSC: P-state invariant, performance statistics
> > > > > -64-Byte prefetching
> > > > > -L2 cache: 1280 kbytes, 8-way associative, 64 bytes/line
> > > > > +L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line
> > > > > 
> > > > 
> > > > Show me the full verbose dmesg of the boot then.
> > > > 
> > > > As another blind guess, try to disable pcid, vm.pmap.pcid_enabled=0.
> > > > 
> > > 
> > > Hi.
> > > 
> > > Intel N100 is reported to crash without this tunable on 13.2 at
> > > freebsd-users-jp ML (as this is a ML in Japanese, reported in
> > > Japanese). [1]
> > > Crashes with UFS, but ZFS is claimed to be OK.
> > > 
> > > N100 is an Alder Lake-N processor WITHOUT P-CORE. [2] [3]
> > > So check logics on workarouund codes (IIRC, all are MFC'ed before 13.2)
> > > wouldn't be working?
> > 
> > Show me the output from x86info -r on the machine, I do not care which
> > specific core it is, they should be all the same.  x86info is available
> > as sysutils/x86info.
> 
> Requested to original reporter and got the result below.
> HTH.
> 
> ---
> root@eq12:~ # x86info -r
> x86info v1.31pre
> /dev/cpuctl0: No such file or directory
> Found 4 identical CPUs
> Extended Family: 0 Extended Model: 11 Family: 6 Model: 190 Stepping: 0
> Type: 0 (Original OEM)
> CPU Model (x86info's best guess): Unknown model.
...
> eax in: 0x001a, eax = 2001 ebx =  ecx =  edx = 
> 

The CPU is reported as small core/atom, so the workaround is turned on.
I do not think that the issue reported is related to the TLB/PG_G errata.

Why do you think that this is hw issue at all, and not some software bug
in the build etc ?



Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2023-08-07 Thread Tomoaki AOKI
On Sun, 6 Aug 2023 12:55:07 +0300
Konstantin Belousov  wrote:

> On Sun, Aug 06, 2023 at 06:12:38PM +0900, Tomoaki AOKI wrote:
> > On Wed, 23 Feb 2022 01:30:28 +0200
> > Konstantin Belousov  wrote:
> > 
> > > On Tue, Feb 22, 2022 at 06:23:17PM -0500, Alexander Motin wrote:
> > > > On 22.02.2022 17:46, Konstantin Belousov wrote:
> > > > > Ok, the next step is to get the CPU feature reports from P- vs. E- 
> > > > > cores.
> > > > > Patch below should work, with verbose boot.
> > > > 
> > > > Not much difference on that level:
> > > > 
> > > > --- zzzp2022-02-22 18:18:24.531704000 -0500
> > > > +++ zzze2022-02-22 18:18:18.631236000 -0500
> > > > @@ -1,22 +1,21 @@
> > > > -CPU 2: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz K8-class CPU)
> > > > +CPU 16: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz K8-class CPU)
> > > >Origin="GenuineIntel"  Id=0x90672  Family=0x6  Model=0x97  Stepping=2
> > > > Features=0xbfebfbff
> > > > Features2=0x7ffafbff
> > > >AMD Features=0x2c100800
> > > >AMD Features2=0x121
> > > >Structured Extended 
> > > > Features=0x239ca7eb
> > > >Structured Extended 
> > > > Features2=0x98c027ac
> > > >Structured Extended 
> > > > Features3=0xfc1cc410
> > > >XSAVE Features=0xf
> > > >IA32_ARCH_CAPS=0xd6b
> > > >VT-x: Basic Features=0x3da0500
> > > >  Pin-Based Controls=0xff
> > > >  Primary Processor 
> > > > Controls=0xfffbfffe
> > > >  Secondary Processor 
> > > > Controls=0xf5d7fff
> > > >  Exit Controls=0x3da0500
> > > >  Entry Controls=0x3da0500
> > > >  EPT Features=0x6f34141
> > > >  VPID 
> > > > Features=0x10f01
> > > >TSC: P-state invariant, performance statistics
> > > > -64-Byte prefetching
> > > > -L2 cache: 1280 kbytes, 8-way associative, 64 bytes/line
> > > > +L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line
> > > > 
> > > 
> > > Show me the full verbose dmesg of the boot then.
> > > 
> > > As another blind guess, try to disable pcid, vm.pmap.pcid_enabled=0.
> > > 
> > 
> > Hi.
> > 
> > Intel N100 is reported to crash without this tunable on 13.2 at
> > freebsd-users-jp ML (as this is a ML in Japanese, reported in
> > Japanese). [1]
> > Crashes with UFS, but ZFS is claimed to be OK.
> > 
> > N100 is an Alder Lake-N processor WITHOUT P-CORE. [2] [3]
> > So check logics on workarouund codes (IIRC, all are MFC'ed before 13.2)
> > wouldn't be working?
> 
> Show me the output from x86info -r on the machine, I do not care which
> specific core it is, they should be all the same.  x86info is available
> as sysutils/x86info.

Requested to original reporter and got the result below.
HTH.

---
root@eq12:~ # x86info -r
x86info v1.31pre
/dev/cpuctl0: No such file or directory
Found 4 identical CPUs
Extended Family: 0 Extended Model: 11 Family: 6 Model: 190 Stepping: 0
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Unknown model.
eax in: 0x, eax = 0020 ebx = 756e6547 ecx = 6c65746e edx =
49656e69
eax in: 0x0001, eax = 000b06e0 ebx = 00800800 ecx = 7ffafbbf edx =
bfebfbff
eax in: 0x0002, eax = 00feff01 ebx = 00f0 ecx =  edx =

eax in: 0x0003, eax =  ebx =  ecx =  edx =

eax in: 0x0004, eax = fc004121 ebx = 01c0003f ecx = 003f edx =

eax in: 0x0005, eax = 0040 ebx = 0040 ecx = 0003 edx =
10102020
eax in: 0x0006, eax = 00578ff7 ebx = 0002 ecx = 0009 edx =

eax in: 0x0007, eax = 0002 ebx = 239ca7eb ecx = 98c007bc edx =
fc184410
eax in: 0x0008, eax =  ebx =  ecx =  edx =

eax in: 0x0009, eax =  ebx =  ecx =  edx =

eax in: 0x000a, eax = 07300605 ebx =  ecx = 0007 edx =
8603
eax in: 0x000b, eax = 0001 ebx = 0001 ecx = 0100 edx =

eax in: 0x000c, eax =  ebx =  ecx =  edx =

eax in: 0x000d, eax = 0207 ebx = 0a88 ecx = 0a88 edx =

eax in: 0x000e, eax =  ebx =  ecx =  edx =

eax in: 0x000f, eax =  ebx =  ecx =  edx =

eax in: 0x0010, eax =  ebx = 0004 ecx =  edx =

eax in: 0x0011, eax =  ebx =  ecx =  edx =

eax in: 0x0012, eax =  ebx =  ecx =  edx =

eax in: 0x0013, eax =  ebx =  ecx =  edx =

eax in: 0x0014, eax = 0001 ebx = 01ff ecx = 8007 edx =

eax in: 0x0015, eax = 0002 ebx = 002a ecx = 0249f000 edx =

eax in: 0x0016, eax = 0320 ebx = 0d48 ecx = 0064 edx =

eax in: 0x0017, eax =  ebx =  ecx =  edx =

eax in: 0x0018, eax = 0004 ebx =  ecx =  edx =

eax in: 0x0019, eax = 0007 ebx = 

Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2023-08-06 Thread Konstantin Belousov
On Sun, Aug 06, 2023 at 06:12:38PM +0900, Tomoaki AOKI wrote:
> On Wed, 23 Feb 2022 01:30:28 +0200
> Konstantin Belousov  wrote:
> 
> > On Tue, Feb 22, 2022 at 06:23:17PM -0500, Alexander Motin wrote:
> > > On 22.02.2022 17:46, Konstantin Belousov wrote:
> > > > Ok, the next step is to get the CPU feature reports from P- vs. E- 
> > > > cores.
> > > > Patch below should work, with verbose boot.
> > > 
> > > Not much difference on that level:
> > > 
> > > --- zzzp2022-02-22 18:18:24.531704000 -0500
> > > +++ zzze2022-02-22 18:18:18.631236000 -0500
> > > @@ -1,22 +1,21 @@
> > > -CPU 2: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz K8-class CPU)
> > > +CPU 16: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz K8-class CPU)
> > >Origin="GenuineIntel"  Id=0x90672  Family=0x6  Model=0x97  Stepping=2
> > > Features=0xbfebfbff
> > > Features2=0x7ffafbff
> > >AMD Features=0x2c100800
> > >AMD Features2=0x121
> > >Structured Extended 
> > > Features=0x239ca7eb
> > >Structured Extended 
> > > Features2=0x98c027ac
> > >Structured Extended 
> > > Features3=0xfc1cc410
> > >XSAVE Features=0xf
> > >IA32_ARCH_CAPS=0xd6b
> > >VT-x: Basic Features=0x3da0500
> > >  Pin-Based Controls=0xff
> > >  Primary Processor 
> > > Controls=0xfffbfffe
> > >  Secondary Processor 
> > > Controls=0xf5d7fff
> > >  Exit Controls=0x3da0500
> > >  Entry Controls=0x3da0500
> > >  EPT Features=0x6f34141
> > >  VPID 
> > > Features=0x10f01
> > >TSC: P-state invariant, performance statistics
> > > -64-Byte prefetching
> > > -L2 cache: 1280 kbytes, 8-way associative, 64 bytes/line
> > > +L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line
> > > 
> > 
> > Show me the full verbose dmesg of the boot then.
> > 
> > As another blind guess, try to disable pcid, vm.pmap.pcid_enabled=0.
> > 
> 
> Hi.
> 
> Intel N100 is reported to crash without this tunable on 13.2 at
> freebsd-users-jp ML (as this is a ML in Japanese, reported in
> Japanese). [1]
> Crashes with UFS, but ZFS is claimed to be OK.
> 
> N100 is an Alder Lake-N processor WITHOUT P-CORE. [2] [3]
> So check logics on workarouund codes (IIRC, all are MFC'ed before 13.2)
> wouldn't be working?

Show me the output from x86info -r on the machine, I do not care which
specific core it is, they should be all the same.  x86info is available
as sysutils/x86info.



Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2023-08-06 Thread Tomoaki AOKI
On Wed, 23 Feb 2022 01:30:28 +0200
Konstantin Belousov  wrote:

> On Tue, Feb 22, 2022 at 06:23:17PM -0500, Alexander Motin wrote:
> > On 22.02.2022 17:46, Konstantin Belousov wrote:
> > > Ok, the next step is to get the CPU feature reports from P- vs. E- cores.
> > > Patch below should work, with verbose boot.
> > 
> > Not much difference on that level:
> > 
> > --- zzzp2022-02-22 18:18:24.531704000 -0500
> > +++ zzze2022-02-22 18:18:18.631236000 -0500
> > @@ -1,22 +1,21 @@
> > -CPU 2: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz K8-class CPU)
> > +CPU 16: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz K8-class CPU)
> >Origin="GenuineIntel"  Id=0x90672  Family=0x6  Model=0x97  Stepping=2
> > Features=0xbfebfbff
> > Features2=0x7ffafbff
> >AMD Features=0x2c100800
> >AMD Features2=0x121
> >Structured Extended 
> > Features=0x239ca7eb
> >Structured Extended 
> > Features2=0x98c027ac
> >Structured Extended 
> > Features3=0xfc1cc410
> >XSAVE Features=0xf
> >IA32_ARCH_CAPS=0xd6b
> >VT-x: Basic Features=0x3da0500
> >  Pin-Based Controls=0xff
> >  Primary Processor 
> > Controls=0xfffbfffe
> >  Secondary Processor 
> > Controls=0xf5d7fff
> >  Exit Controls=0x3da0500
> >  Entry Controls=0x3da0500
> >  EPT Features=0x6f34141
> >  VPID Features=0x10f01
> >TSC: P-state invariant, performance statistics
> > -64-Byte prefetching
> > -L2 cache: 1280 kbytes, 8-way associative, 64 bytes/line
> > +L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line
> > 
> 
> Show me the full verbose dmesg of the boot then.
> 
> As another blind guess, try to disable pcid, vm.pmap.pcid_enabled=0.
> 

Hi.

Intel N100 is reported to crash without this tunable on 13.2 at
freebsd-users-jp ML (as this is a ML in Japanese, reported in
Japanese). [1]
Crashes with UFS, but ZFS is claimed to be OK.

N100 is an Alder Lake-N processor WITHOUT P-CORE. [2] [3]
So check logics on workarouund codes (IIRC, all are MFC'ed before 13.2)
wouldn't be working?

Sorry, I'm just a liaison here and do not have any actual affected
haedware (means that cannot test at all myself).

The reporter claims that the actual hardware is as follows.

 Beelink EQ12 intel N100, 16GB-DDR5,512GB M.2 SSD [4]

Working states are reported as follows.

 Installed to ZFS: OK

 Stock 13.2 without any custom tunalble:
  Crash with UFS

 Tunable vm.pmap.pcid_enabled=0 in /boot/loader.conf:
  No crash reported with situation below.
   *Add "vm.pmap.pcid_enabled=0" on /boot/loader (on installed geom)
just after the installation finished.
   *Reboot to the installed partition.
   *`portsnap fetch extract`
   *`cd /usr/ports/ports-mgmt/portmaster ; make install clean`
   *`portmaster www/apache24`
  And any other operations after above are claimed OK.

 Tunable vm.pmap.pcid_invlpg_workaround=1 instead above:
  Better than stock 13.2, but crashes on `portmaster www/apache24`
  of the procedures above.


[1]
https://lists.freebsd.org/archives/freebsd-users-jp/2023-July/000205.html

[2]
https://www.intel.com/content/www/us/en/products/sku/231803/intel-processor-n100-6m-cache-up-to-3-40-ghz/specifications.html

[3] https://en.wikipedia.org/wiki/Alder_Lake

[4] https://www.bee-link.com/eq12-n100-clone-1-82615581


Regards.

-- 
Tomoaki AOKI



RE: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2022-03-07 Thread Chen, Alvin W
Hi guys,
Any progresses for this issue?



Regards,
Alvin Chen
Dell | Comercial Client Group
office +86-10-82862506, fax +86-10-82861554, Dell Lync 8672506 
weike_c...@dell.com


Internal Use - Confidential

-Original Message-
From: Konstantin Belousov  
Sent: 2022年2月24日 9:24
To: Alexander Motin
Cc: Mike Karels; Tomoaki AOKI; Chen, Alvin W; freebsd-current@freebsd.org
Subject: Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause 
data corrupt due to P-Core


[EXTERNAL EMAIL] 

On Wed, Feb 23, 2022 at 12:25:24PM -0500, Alexander Motin wrote:
> On 22.02.2022 19:00, Konstantin Belousov wrote:
> > On Tue, Feb 22, 2022 at 06:53:09PM -0500, Alexander Motin wrote:
> > > On 22.02.2022 18:41, Konstantin Belousov wrote:
> > > > On Tue, Feb 22, 2022 at 06:38:24PM -0500, Alexander Motin wrote:
> > > > > On 22.02.2022 18:30, Konstantin Belousov wrote:
> > > > > > As another blind guess, try to disable pcid, vm.pmap.pcid_enabled=0.
> > > > > 
> > > > > Do you mean it to be a workaround for TrueNAS 12, or it should 
> > > > > provide some information?  The system is at the office and has 
> > > > > no IPMI, so I can't switch the boot device from home right now.
> > > > I intended to see if it is the cause or related feature.
> > > 
> > > I'll try that on the 12 tomorrow, if applicable.
> > 
> > Yes should be relevant still.
> 
> It did the trick.  I repeated several times successful boots with the 
> pcid disabled, and failed ones with default enabled.  In attachment 
> you may find verbose serial console output captures with pcid disabled 
> and enabled, though without the cpuinfo patch.  During the testing I 
> had only one P and one E cores enabled to reduce noise.  Only after 
> that I found P core having SMT enabled, but I then repeated without 
> SMT also, so it is indeed irrelevant.
> 
> I'm curios, what in pcid could differentiate the P and E cores, and 
> have it got fixed in latest stable/13, or I am just "unlucky" to not 
> reproduce it there?

I am curious as well.  PCID works on both big Intel cores, and on small cores 
like Apollo Lake etc.  So the fact that it does not properly interact in P/E 
settings either mean that there is something I did not accounted for from the 
spec, or there is a bug in silicon.

I have no idea why do we work on stable/13 and HEAD.  There were enough changes 
to PCID code there, but it was mostly restructuring and polishing.

So the only way to get more understanding is to bisect to see which commit on 
HEAD fixed the boot.



Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2022-02-23 Thread Konstantin Belousov
On Wed, Feb 23, 2022 at 12:25:24PM -0500, Alexander Motin wrote:
> On 22.02.2022 19:00, Konstantin Belousov wrote:
> > On Tue, Feb 22, 2022 at 06:53:09PM -0500, Alexander Motin wrote:
> > > On 22.02.2022 18:41, Konstantin Belousov wrote:
> > > > On Tue, Feb 22, 2022 at 06:38:24PM -0500, Alexander Motin wrote:
> > > > > On 22.02.2022 18:30, Konstantin Belousov wrote:
> > > > > > As another blind guess, try to disable pcid, vm.pmap.pcid_enabled=0.
> > > > > 
> > > > > Do you mean it to be a workaround for TrueNAS 12, or it should 
> > > > > provide some
> > > > > information?  The system is at the office and has no IPMI, so I can't 
> > > > > switch
> > > > > the boot device from home right now.
> > > > I intended to see if it is the cause or related feature.
> > > 
> > > I'll try that on the 12 tomorrow, if applicable.
> > 
> > Yes should be relevant still.
> 
> It did the trick.  I repeated several times successful boots with the pcid
> disabled, and failed ones with default enabled.  In attachment you may find
> verbose serial console output captures with pcid disabled and enabled,
> though without the cpuinfo patch.  During the testing I had only one P and
> one E cores enabled to reduce noise.  Only after that I found P core having
> SMT enabled, but I then repeated without SMT also, so it is indeed
> irrelevant.
> 
> I'm curios, what in pcid could differentiate the P and E cores, and have it
> got fixed in latest stable/13, or I am just "unlucky" to not reproduce it
> there?

I am curious as well.  PCID works on both big Intel cores, and on small
cores like Apollo Lake etc.  So the fact that it does not properly interact
in P/E settings either mean that there is something I did not accounted
for from the spec, or there is a bug in silicon.

I have no idea why do we work on stable/13 and HEAD.  There were enough
changes to PCID code there, but it was mostly restructuring and polishing.

So the only way to get more understanding is to bisect to see which commit
on HEAD fixed the boot.



Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2022-02-22 Thread Konstantin Belousov
On Tue, Feb 22, 2022 at 06:23:17PM -0500, Alexander Motin wrote:
> On 22.02.2022 17:46, Konstantin Belousov wrote:
> > Ok, the next step is to get the CPU feature reports from P- vs. E- cores.
> > Patch below should work, with verbose boot.
> 
> Not much difference on that level:
> 
> --- zzzp2022-02-22 18:18:24.531704000 -0500
> +++ zzze2022-02-22 18:18:18.631236000 -0500
> @@ -1,22 +1,21 @@
> -CPU 2: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz K8-class CPU)
> +CPU 16: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz K8-class CPU)
>Origin="GenuineIntel"  Id=0x90672  Family=0x6  Model=0x97  Stepping=2
> Features=0xbfebfbff
> Features2=0x7ffafbff
>AMD Features=0x2c100800
>AMD Features2=0x121
>Structured Extended 
> Features=0x239ca7eb
>Structured Extended 
> Features2=0x98c027ac
>Structured Extended 
> Features3=0xfc1cc410
>XSAVE Features=0xf
>IA32_ARCH_CAPS=0xd6b
>VT-x: Basic Features=0x3da0500
>  Pin-Based Controls=0xff
>  Primary Processor 
> Controls=0xfffbfffe
>  Secondary Processor 
> Controls=0xf5d7fff
>  Exit Controls=0x3da0500
>  Entry Controls=0x3da0500
>  EPT Features=0x6f34141
>  VPID Features=0x10f01
>TSC: P-state invariant, performance statistics
> -64-Byte prefetching
> -L2 cache: 1280 kbytes, 8-way associative, 64 bytes/line
> +L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line
> 

Show me the full verbose dmesg of the boot then.

As another blind guess, try to disable pcid, vm.pmap.pcid_enabled=0.



Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2022-02-22 Thread Alexander Motin

On 22.02.2022 17:46, Konstantin Belousov wrote:

Ok, the next step is to get the CPU feature reports from P- vs. E- cores.
Patch below should work, with verbose boot.


Not much difference on that level:

--- zzzp2022-02-22 18:18:24.531704000 -0500
+++ zzze2022-02-22 18:18:18.631236000 -0500
@@ -1,22 +1,21 @@
-CPU 2: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz K8-class CPU)
+CPU 16: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz K8-class CPU)
   Origin="GenuineIntel"  Id=0x90672  Family=0x6  Model=0x97  Stepping=2
Features=0xbfebfbff
Features2=0x7ffafbff
   AMD Features=0x2c100800
   AMD Features2=0x121
   Structured Extended 
Features=0x239ca7eb
   Structured Extended 
Features2=0x98c027ac
   Structured Extended 
Features3=0xfc1cc410

   XSAVE Features=0xf
   IA32_ARCH_CAPS=0xd6b
   VT-x: Basic Features=0x3da0500
 Pin-Based Controls=0xff
 Primary Processor 
Controls=0xfffbfffe
 Secondary Processor 
Controls=0xf5d7fff

 Exit Controls=0x3da0500
 Entry Controls=0x3da0500
 EPT Features=0x6f34141
 VPID 
Features=0x10f01

   TSC: P-state invariant, performance statistics
-64-Byte prefetching
-L2 cache: 1280 kbytes, 8-way associative, 64 bytes/line
+L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line

--
Alexander Motin



Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2022-02-22 Thread Konstantin Belousov
On Sat, Feb 19, 2022 at 07:26:24PM -0500, Alexander Motin wrote:
> On 19.02.2022 13:23, Konstantin Belousov wrote:
> > On Sat, Feb 19, 2022 at 12:14:16PM -0500, Alexander Motin wrote:
> > > On 19.02.2022 12:02, Mike Karels wrote:
> > > > On 18 Feb 2022, at 20:55, Tomoaki AOKI wrote:
> > > > > Just a thought, but can it be the reason with timing (e.g., rendezvous
> > > > > within (i)threads, hardware controlls without using hardware timer)
> > > > > problem?
> > > > > 
> > > > > On FreeBSD, IIUC, multi processor (multi core) implementation assumes
> > > > > SMP (differs only clock speed) and end up with difference of
> > > > > performance at same clock speed within P-core and E-core, possibly.
> > > > 
> > > > Another possibility is that the system is confused by having 
> > > > hyperthreading
> > > > on the P cores but not the E cores.
> > > 
> > > No, I've tried to disable SMT and different number of cores to make it 
> > > look
> > > identical and uniform for the scheduler.  The only thing I could not test 
> > > is
> > > disabling all P cores to test only E, the motherboard does not allow that,
> > > requiring at least one P core enabled.
> > 
> > Does the kernel select MWAIT as the idle method?  If you set idle to spin,
> > is anything change?
> 
> By default kernel selects ACPI, using MWAIT:
> machdep.idle: acpi
> dev.cpu.0.cx_method: C1/mwait/hwc C2/mwait/hwc C3/mwait/hwc
> 
> I've tried to do in loader:
> set machdep.idle_mwait=0
> set machdep.idle="spin"  (also tried "hlt")
> , but without visible positive effects.
I was only interested in spin, for hlt there is no chance if spin did not
worked.

Ok, the next step is to get the CPU feature reports from P- vs. E- cores.
Patch below should work, with verbose boot.

diff --git a/sys/x86/x86/identcpu.c b/sys/x86/x86/identcpu.c
index 849f532dbf8b..9e4da4722f77 100644
--- a/sys/x86/x86/identcpu.c
+++ b/sys/x86/x86/identcpu.c
@@ -246,7 +246,7 @@ printcpuinfo(void)
u_int regs[4], i;
char *brand;
 
-   printf("CPU: ");
+   printf("CPU %d: ", PCPU_GET(cpuid));
 #ifdef __i386__
cpu_class = cpus[cpu].cpu_class;
strncpy(cpu_model, cpus[cpu].cpu_name, sizeof (cpu_model));
diff --git a/sys/x86/x86/mp_x86.c b/sys/x86/x86/mp_x86.c
index 3b0e25172d0d..4299eb5348e6 100644
--- a/sys/x86/x86/mp_x86.c
+++ b/sys/x86/x86/mp_x86.c
@@ -1089,7 +1089,7 @@ init_secondary_tail(void)
load_es(_udatasel);
load_fs(_ufssel);
 #endif
-
+printcpuinfo();
mtx_unlock_spin(_boot_mtx);
 
/* Wait until all the AP's are up. */



Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2022-02-19 Thread Alexander Motin

On 19.02.2022 13:23, Konstantin Belousov wrote:

On Sat, Feb 19, 2022 at 12:14:16PM -0500, Alexander Motin wrote:

On 19.02.2022 12:02, Mike Karels wrote:

On 18 Feb 2022, at 20:55, Tomoaki AOKI wrote:

Just a thought, but can it be the reason with timing (e.g., rendezvous
within (i)threads, hardware controlls without using hardware timer)
problem?

On FreeBSD, IIUC, multi processor (multi core) implementation assumes
SMP (differs only clock speed) and end up with difference of
performance at same clock speed within P-core and E-core, possibly.


Another possibility is that the system is confused by having hyperthreading
on the P cores but not the E cores.


No, I've tried to disable SMT and different number of cores to make it look
identical and uniform for the scheduler.  The only thing I could not test is
disabling all P cores to test only E, the motherboard does not allow that,
requiring at least one P core enabled.


Does the kernel select MWAIT as the idle method?  If you set idle to spin,
is anything change?


By default kernel selects ACPI, using MWAIT:
machdep.idle: acpi
dev.cpu.0.cx_method: C1/mwait/hwc C2/mwait/hwc C3/mwait/hwc

I've tried to do in loader:
set machdep.idle_mwait=0
set machdep.idle="spin"  (also tried "hlt")
, but without visible positive effects.

--
Alexander Motin



Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2022-02-19 Thread Konstantin Belousov
On Sat, Feb 19, 2022 at 12:14:16PM -0500, Alexander Motin wrote:
> On 19.02.2022 12:02, Mike Karels wrote:
> > On 18 Feb 2022, at 20:55, Tomoaki AOKI wrote:
> > > Just a thought, but can it be the reason with timing (e.g., rendezvous
> > > within (i)threads, hardware controlls without using hardware timer)
> > > problem?
> > > 
> > > On FreeBSD, IIUC, multi processor (multi core) implementation assumes
> > > SMP (differs only clock speed) and end up with difference of
> > > performance at same clock speed within P-core and E-core, possibly.
> > 
> > Another possibility is that the system is confused by having hyperthreading
> > on the P cores but not the E cores.
> 
> No, I've tried to disable SMT and different number of cores to make it look
> identical and uniform for the scheduler.  The only thing I could not test is
> disabling all P cores to test only E, the motherboard does not allow that,
> requiring at least one P core enabled.

Does the kernel select MWAIT as the idle method?  If you set idle to spin,
is anything change?



Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2022-02-19 Thread Alexander Motin

On 19.02.2022 12:02, Mike Karels wrote:

On 18 Feb 2022, at 20:55, Tomoaki AOKI wrote:

Just a thought, but can it be the reason with timing (e.g., rendezvous
within (i)threads, hardware controlls without using hardware timer)
problem?

On FreeBSD, IIUC, multi processor (multi core) implementation assumes
SMP (differs only clock speed) and end up with difference of
performance at same clock speed within P-core and E-core, possibly.


Another possibility is that the system is confused by having hyperthreading
on the P cores but not the E cores.


No, I've tried to disable SMT and different number of cores to make it 
look identical and uniform for the scheduler.  The only thing I could 
not test is disabling all P cores to test only E, the motherboard does 
not allow that, requiring at least one P core enabled.



BTW, how aarch64 guys implement big.Little support to avoid such a case?
Or are they simply disable all Little cores and use big only?


Are there supported arm64 systems with asymmetric processors yet?

Mike


On Fri, 18 Feb 2022 15:36:27 -0500
Alexander Motin  wrote:


This looks pretty weird to me, but I don't think it is specific to the
FAT32.  Just today I've first noticed that booting TrueNAS 12.0-U8
(http://download.freenas.org/12.0/STABLE/U8/x64/TrueNAS-12.0-U8.iso)
(based on FreeBSD 12.2 with many backports) from NVMe SSD (I don't
insist on NVMe so far) and ZFS almost never completes successfully,
ending up in hangs or random stack corruption panics in ZFS threads as
soon as at least one E core is enabled of my i7-12700K.  Disabling all E
cores fixes the problem.  Updated to TrueNAS 13.0-BETA1 (based on
FreeBSD 13.0-STABLE from few weeks ago) it does not demonstrate the
problem any more.  The same TrueNAS 12.0-U8 kernel booted from NFS does
not seem to demonstrate the problem with ZFS mounting, but I haven't
stressed it much so far.

There are seem to be dragons somewhere...

On 15.02.2022 22:29, Chen, Alvin W wrote:

Hi Guys,
Any updates to support Intel P-core + E-core?
I have filed a bug: PR 261169
 , but no updates.
Does anybody know the progress?

For Intel Adler Lake P core + E core processor (i7-12700T), copying
files to FAT32 partition, the file corrupted (50%), but ZFS is fine.
After disabling E core in the code by restrict the max cpu number, this
issue is gone. And No E core processor has no such issue, like i7-12400.

HW ENV:
CPU: Intel AlderLake 12th Gen i7-12700T
Disk: NVME SSD

There are 3 methods to reproduce this issue:
1. Make FreeBSD 13 USB disk installer, install FreeBSD with UFS, and
select install source and ports, the txz package checking will be failed.

2. Boot to shell by USB disk installer, and mount a FAT32 partition (on
SSD), and copy a 300MB file to the FAT32, compare the sha256 checksums
for the source file and the dst file, the checksum are different (50%).
Or if there is a 300MB file in FAT32 partition, mount the partition, and
for the first time check the sha256 value by running 'sha256 file.tgz',
the checksum is wrong, but the second time, the checksum is correct.

3. Install FreeBSD 13 with ZFS, and it can work well. And boot into
FreeBSD, disable swap, and format the SWAP partition to FAT32. Do the
testing as above.

Regards,

Alvin Chen

Dell | Comercial Client Group

office +86-10-82862506, fax +86-10-82861554, Dell Lync 8672506
weike_c...@dell.com 

Internal Use - Confidential



--
Alexander Motin




--
青木 知明  [Tomoaki AOKI]


--
Alexander Motin



Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2022-02-19 Thread Mike Karels


On 18 Feb 2022, at 20:55, Tomoaki AOKI wrote:

> Just a thought, but can it be the reason with timing (e.g., rendezvous
> within (i)threads, hardware controlls without using hardware timer)
> problem?
>
> On FreeBSD, IIUC, multi processor (multi core) implementation assumes
> SMP (differs only clock speed) and end up with difference of
> performance at same clock speed within P-core and E-core, possibly.

Another possibility is that the system is confused by having hyperthreading
on the P cores but not the E cores.

> BTW, how aarch64 guys implement big.Little support to avoid such a case?
> Or are they simply disable all Little cores and use big only?

Are there supported arm64 systems with asymmetric processors yet?

Mike

> On Fri, 18 Feb 2022 15:36:27 -0500
> Alexander Motin  wrote:
>
>> This looks pretty weird to me, but I don't think it is specific to the
>> FAT32.  Just today I've first noticed that booting TrueNAS 12.0-U8
>> (http://download.freenas.org/12.0/STABLE/U8/x64/TrueNAS-12.0-U8.iso)
>> (based on FreeBSD 12.2 with many backports) from NVMe SSD (I don't
>> insist on NVMe so far) and ZFS almost never completes successfully,
>> ending up in hangs or random stack corruption panics in ZFS threads as
>> soon as at least one E core is enabled of my i7-12700K.  Disabling all E
>> cores fixes the problem.  Updated to TrueNAS 13.0-BETA1 (based on
>> FreeBSD 13.0-STABLE from few weeks ago) it does not demonstrate the
>> problem any more.  The same TrueNAS 12.0-U8 kernel booted from NFS does
>> not seem to demonstrate the problem with ZFS mounting, but I haven't
>> stressed it much so far.
>>
>> There are seem to be dragons somewhere...
>>
>> On 15.02.2022 22:29, Chen, Alvin W wrote:
>>> Hi Guys,
>>> Any updates to support Intel P-core + E-core?
>>> I have filed a bug: PR 261169
>>>  , but no updates.
>>> Does anybody know the progress?
>>>
>>> For Intel Adler Lake P core + E core processor (i7-12700T), copying
>>> files to FAT32 partition, the file corrupted (50%), but ZFS is fine.
>>> After disabling E core in the code by restrict the max cpu number, this
>>> issue is gone. And No E core processor has no such issue, like i7-12400.
>>>
>>> HW ENV:
>>> CPU: Intel AlderLake 12th Gen i7-12700T
>>> Disk: NVME SSD
>>>
>>> There are 3 methods to reproduce this issue:
>>> 1. Make FreeBSD 13 USB disk installer, install FreeBSD with UFS, and
>>> select install source and ports, the txz package checking will be failed.
>>>
>>> 2. Boot to shell by USB disk installer, and mount a FAT32 partition (on
>>> SSD), and copy a 300MB file to the FAT32, compare the sha256 checksums
>>> for the source file and the dst file, the checksum are different (50%).
>>> Or if there is a 300MB file in FAT32 partition, mount the partition, and
>>> for the first time check the sha256 value by running 'sha256 file.tgz',
>>> the checksum is wrong, but the second time, the checksum is correct.
>>>
>>> 3. Install FreeBSD 13 with ZFS, and it can work well. And boot into
>>> FreeBSD, disable swap, and format the SWAP partition to FAT32. Do the
>>> testing as above.
>>>
>>> Regards,
>>>
>>> Alvin Chen
>>>
>>> Dell | Comercial Client Group
>>>
>>> office +86-10-82862506, fax +86-10-82861554, Dell Lync 8672506
>>> weike_c...@dell.com 
>>>
>>> Internal Use - Confidential
>>>
>>
>> -- 
>> Alexander Motin
>>
>
>
> -- 
> 青木 知明  [Tomoaki AOKI]



Re: [Intel AlderLake]Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2022-02-18 Thread Tomoaki AOKI
Just a thought, but can it be the reason with timing (e.g., rendezvous
within (i)threads, hardware controlls without using hardware timer)
problem?

On FreeBSD, IIUC, multi processor (multi core) implementation assumes
SMP (differs only clock speed) and end up with difference of
performance at same clock speed within P-core and E-core, possibly.

BTW, how aarch64 guys implement big.Little support to avoid such a case?
Or are they simply disable all Little cores and use big only?


On Fri, 18 Feb 2022 15:36:27 -0500
Alexander Motin  wrote:

> This looks pretty weird to me, but I don't think it is specific to the 
> FAT32.  Just today I've first noticed that booting TrueNAS 12.0-U8 
> (http://download.freenas.org/12.0/STABLE/U8/x64/TrueNAS-12.0-U8.iso) 
> (based on FreeBSD 12.2 with many backports) from NVMe SSD (I don't 
> insist on NVMe so far) and ZFS almost never completes successfully, 
> ending up in hangs or random stack corruption panics in ZFS threads as 
> soon as at least one E core is enabled of my i7-12700K.  Disabling all E 
> cores fixes the problem.  Updated to TrueNAS 13.0-BETA1 (based on 
> FreeBSD 13.0-STABLE from few weeks ago) it does not demonstrate the 
> problem any more.  The same TrueNAS 12.0-U8 kernel booted from NFS does 
> not seem to demonstrate the problem with ZFS mounting, but I haven't 
> stressed it much so far.
> 
> There are seem to be dragons somewhere...
> 
> On 15.02.2022 22:29, Chen, Alvin W wrote:
> > Hi Guys,
> > Any updates to support Intel P-core + E-core?
> > I have filed a bug: PR 261169 
> > , but no updates.
> > Does anybody know the progress?
> > 
> > For Intel Adler Lake P core + E core processor (i7-12700T), copying 
> > files to FAT32 partition, the file corrupted (50%), but ZFS is fine. 
> > After disabling E core in the code by restrict the max cpu number, this 
> > issue is gone. And No E core processor has no such issue, like i7-12400.
> > 
> > HW ENV:
> > CPU: Intel AlderLake 12th Gen i7-12700T
> > Disk: NVME SSD
> > 
> > There are 3 methods to reproduce this issue:
> > 1. Make FreeBSD 13 USB disk installer, install FreeBSD with UFS, and 
> > select install source and ports, the txz package checking will be failed.
> > 
> > 2. Boot to shell by USB disk installer, and mount a FAT32 partition (on 
> > SSD), and copy a 300MB file to the FAT32, compare the sha256 checksums 
> > for the source file and the dst file, the checksum are different (50%). 
> > Or if there is a 300MB file in FAT32 partition, mount the partition, and 
> > for the first time check the sha256 value by running 'sha256 file.tgz', 
> > the checksum is wrong, but the second time, the checksum is correct.
> > 
> > 3. Install FreeBSD 13 with ZFS, and it can work well. And boot into 
> > FreeBSD, disable swap, and format the SWAP partition to FAT32. Do the 
> > testing as above.
> > 
> > Regards,
> > 
> > Alvin Chen
> > 
> > Dell | Comercial Client Group
> > 
> > office +86-10-82862506, fax +86-10-82861554, Dell Lync 8672506 
> > weike_c...@dell.com 
> > 
> > Internal Use - Confidential
> > 
> 
> -- 
> Alexander Motin
> 


-- 
青木 知明  [Tomoaki AOKI]



Re: [Intel AlderLake]Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2022-02-18 Thread Alexander Motin
This looks pretty weird to me, but I don't think it is specific to the 
FAT32.  Just today I've first noticed that booting TrueNAS 12.0-U8 
(http://download.freenas.org/12.0/STABLE/U8/x64/TrueNAS-12.0-U8.iso) 
(based on FreeBSD 12.2 with many backports) from NVMe SSD (I don't 
insist on NVMe so far) and ZFS almost never completes successfully, 
ending up in hangs or random stack corruption panics in ZFS threads as 
soon as at least one E core is enabled of my i7-12700K.  Disabling all E 
cores fixes the problem.  Updated to TrueNAS 13.0-BETA1 (based on 
FreeBSD 13.0-STABLE from few weeks ago) it does not demonstrate the 
problem any more.  The same TrueNAS 12.0-U8 kernel booted from NFS does 
not seem to demonstrate the problem with ZFS mounting, but I haven't 
stressed it much so far.


There are seem to be dragons somewhere...

On 15.02.2022 22:29, Chen, Alvin W wrote:

Hi Guys,
Any updates to support Intel P-core + E-core?
I have filed a bug: PR 261169 
, but no updates.

Does anybody know the progress?

For Intel Adler Lake P core + E core processor (i7-12700T), copying 
files to FAT32 partition, the file corrupted (50%), but ZFS is fine. 
After disabling E core in the code by restrict the max cpu number, this 
issue is gone. And No E core processor has no such issue, like i7-12400.


HW ENV:
CPU: Intel AlderLake 12th Gen i7-12700T
Disk: NVME SSD

There are 3 methods to reproduce this issue:
1. Make FreeBSD 13 USB disk installer, install FreeBSD with UFS, and 
select install source and ports, the txz package checking will be failed.


2. Boot to shell by USB disk installer, and mount a FAT32 partition (on 
SSD), and copy a 300MB file to the FAT32, compare the sha256 checksums 
for the source file and the dst file, the checksum are different (50%). 
Or if there is a 300MB file in FAT32 partition, mount the partition, and 
for the first time check the sha256 value by running 'sha256 file.tgz', 
the checksum is wrong, but the second time, the checksum is correct.


3. Install FreeBSD 13 with ZFS, and it can work well. And boot into 
FreeBSD, disable swap, and format the SWAP partition to FAT32. Do the 
testing as above.


Regards,

Alvin Chen

Dell | Comercial Client Group

office +86-10-82862506, fax +86-10-82861554, Dell Lync 8672506 
weike_c...@dell.com 


Internal Use - Confidential



--
Alexander Motin



[Intel AlderLake]Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2022-02-15 Thread Chen, Alvin W
Hi Guys,
Any updates to support Intel P-core + E-core?
I have filed a bug: PR 
261169, but no 
updates.
Does anybody know the progress?

For Intel Adler Lake P core + E core processor (i7-12700T), copying files to 
FAT32 partition, the file corrupted (50%), but ZFS is fine. After disabling E 
core in the code by restrict the max cpu number, this issue is gone. And No E 
core processor has no such issue, like i7-12400.

HW ENV:
CPU: Intel AlderLake 12th Gen i7-12700T
Disk: NVME SSD

There are 3 methods to reproduce this issue:
1. Make FreeBSD 13 USB disk installer, install FreeBSD with UFS, and select 
install source and ports, the txz package checking will be failed.

2. Boot to shell by USB disk installer, and mount a FAT32 partition (on SSD), 
and copy a 300MB file to the FAT32, compare the sha256 checksums for the source 
file and the dst file, the checksum are different (50%). Or if there is a 300MB 
file in FAT32 partition, mount the partition, and for the first time check the 
sha256 value by running 'sha256 file.tgz', the checksum is wrong, but the 
second time, the checksum is correct.

3. Install FreeBSD 13 with ZFS, and it can work well. And boot into FreeBSD, 
disable swap, and format the SWAP partition to FAT32. Do the testing as above.



Regards,
Alvin Chen
Dell | Comercial Client Group
office +86-10-82862506, fax +86-10-82861554, Dell Lync 8672506 
weike_c...@dell.com



Internal Use - Confidential