Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2022-02-19 Thread Alexander Motin

On 19.02.2022 13:23, Konstantin Belousov wrote:

On Sat, Feb 19, 2022 at 12:14:16PM -0500, Alexander Motin wrote:

On 19.02.2022 12:02, Mike Karels wrote:

On 18 Feb 2022, at 20:55, Tomoaki AOKI wrote:

Just a thought, but can it be the reason with timing (e.g., rendezvous
within (i)threads, hardware controlls without using hardware timer)
problem?

On FreeBSD, IIUC, multi processor (multi core) implementation assumes
SMP (differs only clock speed) and end up with difference of
performance at same clock speed within P-core and E-core, possibly.


Another possibility is that the system is confused by having hyperthreading
on the P cores but not the E cores.


No, I've tried to disable SMT and different number of cores to make it look
identical and uniform for the scheduler.  The only thing I could not test is
disabling all P cores to test only E, the motherboard does not allow that,
requiring at least one P core enabled.


Does the kernel select MWAIT as the idle method?  If you set idle to spin,
is anything change?


By default kernel selects ACPI, using MWAIT:
machdep.idle: acpi
dev.cpu.0.cx_method: C1/mwait/hwc C2/mwait/hwc C3/mwait/hwc

I've tried to do in loader:
set machdep.idle_mwait=0
set machdep.idle="spin"  (also tried "hlt")
, but without visible positive effects.

--
Alexander Motin



Re: Panic, CURRENT, yesterday

2022-02-19 Thread Larry Rosenman

On 02/09/2022 10:08 pm, Larry Rosenman wrote:


Another one today:
❯ more /var/crash/core.txt.1
borg.lerctr.org dumped core - see /var/crash/vmcore.1

Wed Feb  9 19:30:43 CST 2022





core is available, and I can give access and/or send the core and
kernel/debug stuff.


True for this one too.


Yet another one:
❯ more core.txt.3
borg.lerctr.org dumped core - see /var/crash/vmcore.3

Sat Feb 19 00:42:59 CST 2022

FreeBSD borg.lerctr.org 14.0-CURRENT FreeBSD 14.0-CURRENT #56 
ler/freebsd-main-changes-n253181-c140933ef40: Tue Feb 15 12:26:23 CST 
2022 
r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL  amd64


panic: ng_snd_item: 42 != 173

GNU gdb (GDB) 11.2 [GDB v11.2 for FreeBSD]
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 


This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd14.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...

Unread portion of the kernel message buffer:
panic: ng_snd_item: 42 != 173
cpuid = 0
time = 1645251876
KDB: stack backtrace:
#0 0x80516005 at kdb_backtrace+0x65
#1 0x804cba7f at vpanic+0x17f
#2 0x804cb853 at panic+0x43
#3 0x82c755b7 at ng_snd_item+0x587
#4 0x82c8e263 at ng_ether_output+0xb3
#5 0x805e0e2d at ether_output+0x6cd
#6 0x805f6461 at arpintr+0xd71
#7 0x805e5797 at netisr_dispatch_src+0x97
#8 0x805e112e at ether_demux+0x14e
#9 0x82c8e89c at ng_ether_rcv_upper+0x12c
#10 0x82c75dab at ng_apply_item+0x7eb
#11 0x82c7538d at ng_snd_item+0x35d
#12 0x82c75dab at ng_apply_item+0x7eb
#13 0x82c7538d at ng_snd_item+0x35d
#14 0x82c8e33f at ng_ether_input+0x9f
#15 0x805e23e7 at ether_nh_input+0x217
#16 0x805e5797 at netisr_dispatch_src+0x97
#17 0x805e159d at ether_input+0x5d
Uptime: 2d6h42m17s
Dumping 29172 out of 131023 
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%


__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" 
(offsetof(struct pcpu,

(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=)
at /usr/src/sys/kern/kern_shutdown.c:399
#2  0x804cb68f in kern_reboot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:487
#3  0x804cbaee in vpanic (fmt=0x82c7ed98 "%s: %d != %d",
ap=) at /usr/src/sys/kern/kern_shutdown.c:920
#4  0x804cb853 in panic (fmt=)
at /usr/src/sys/kern/kern_shutdown.c:844
#5  0x82c755b7 in ng_snd_item (item=0xf8131de0bd80, flags=0)
at /usr/src/sys/netgraph/ng_base.c:2256
#6  0x82c8e263 in ng_ether_output (ifp=,
ifp@entry=,
mp=0xfe025a044868,
mp@entry=)
at /usr/src/sys/netgraph/ng_ether.c:294
#7  0x805e0e2d in ether_output (ifp=0xf8010cfe0800,
m=0xf81d2e92b000, dst=, ro=)
at /usr/src/sys/net/if_ethersubr.c:427
#8  0x805f6461 in in_arpinput (m=0xf81d2e92b000)
at /usr/src/sys/netinet/if_ether.c:1129
#9  arpintr (m=0xf81d2e92b000,
m@entry=)
at /usr/src/sys/netinet/if_ether.c:739
#10 0x805e5797 in netisr_dispatch_src (proto=4,
source=source@entry=0, m=0xf81d2e92b000)
at /usr/src/sys/net/netisr.c:1153
#11 0x805e5aef in netisr_dispatch (proto=,
m=) at /usr/src/sys/net/netisr.c:1244
#12 0x805e112e in ether_demux (ifp=ifp@entry=0xf8010cfe0800,
m=, m@entry=0xf81d2e92b000)
at /usr/src/sys/net/if_ethersubr.c:926
#13 0x82c8e89c in ng_ether_rcv_upper (hook=,
hook@entry=,
item=0xf8131de0bd80,
item@entry=)
at /usr/src/sys/netgraph/ng_ether.c:742
#14 0x82c75dab in ng_apply_item 
(node=node@entry=0xf81365630b00,

item=item@entry=0xf8131de0bd80, rw=0)
at /usr/src/sys/netgraph/ng_base.c:2406
#15 0x82c7538d in ng_snd_item (item=0xf8131de0bd80,
item@entry=, 
flags=0,

flags@entry=)
at /usr/src/sys/netgraph/ng_base.c:2323
#16 0x82c75dab in ng_apply_item 
(node=node@entry=0xf813660f8500,

item=item@entry=0xf8131de0bd80, rw=0)
at /usr/src/sys/netgraph/ng_base.c:2406
#17 0x82c7538d in ng_snd_item 
(item=item@entry=0xf8131de0bd80,

flags=flags@entry=0) at /usr/src/sys/netgraph/ng_base.c:2323
#18 0x82c8e33f in ng_ether_input (ifp=,
ifp@entry=,
   

Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2022-02-19 Thread Konstantin Belousov
On Sat, Feb 19, 2022 at 12:14:16PM -0500, Alexander Motin wrote:
> On 19.02.2022 12:02, Mike Karels wrote:
> > On 18 Feb 2022, at 20:55, Tomoaki AOKI wrote:
> > > Just a thought, but can it be the reason with timing (e.g., rendezvous
> > > within (i)threads, hardware controlls without using hardware timer)
> > > problem?
> > > 
> > > On FreeBSD, IIUC, multi processor (multi core) implementation assumes
> > > SMP (differs only clock speed) and end up with difference of
> > > performance at same clock speed within P-core and E-core, possibly.
> > 
> > Another possibility is that the system is confused by having hyperthreading
> > on the P cores but not the E cores.
> 
> No, I've tried to disable SMT and different number of cores to make it look
> identical and uniform for the scheduler.  The only thing I could not test is
> disabling all P cores to test only E, the motherboard does not allow that,
> requiring at least one P core enabled.

Does the kernel select MWAIT as the idle method?  If you set idle to spin,
is anything change?



Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2022-02-19 Thread Alexander Motin

On 19.02.2022 12:02, Mike Karels wrote:

On 18 Feb 2022, at 20:55, Tomoaki AOKI wrote:

Just a thought, but can it be the reason with timing (e.g., rendezvous
within (i)threads, hardware controlls without using hardware timer)
problem?

On FreeBSD, IIUC, multi processor (multi core) implementation assumes
SMP (differs only clock speed) and end up with difference of
performance at same clock speed within P-core and E-core, possibly.


Another possibility is that the system is confused by having hyperthreading
on the P cores but not the E cores.


No, I've tried to disable SMT and different number of cores to make it 
look identical and uniform for the scheduler.  The only thing I could 
not test is disabling all P cores to test only E, the motherboard does 
not allow that, requiring at least one P core enabled.



BTW, how aarch64 guys implement big.Little support to avoid such a case?
Or are they simply disable all Little cores and use big only?


Are there supported arm64 systems with asymmetric processors yet?

Mike


On Fri, 18 Feb 2022 15:36:27 -0500
Alexander Motin  wrote:


This looks pretty weird to me, but I don't think it is specific to the
FAT32.  Just today I've first noticed that booting TrueNAS 12.0-U8
(http://download.freenas.org/12.0/STABLE/U8/x64/TrueNAS-12.0-U8.iso)
(based on FreeBSD 12.2 with many backports) from NVMe SSD (I don't
insist on NVMe so far) and ZFS almost never completes successfully,
ending up in hangs or random stack corruption panics in ZFS threads as
soon as at least one E core is enabled of my i7-12700K.  Disabling all E
cores fixes the problem.  Updated to TrueNAS 13.0-BETA1 (based on
FreeBSD 13.0-STABLE from few weeks ago) it does not demonstrate the
problem any more.  The same TrueNAS 12.0-U8 kernel booted from NFS does
not seem to demonstrate the problem with ZFS mounting, but I haven't
stressed it much so far.

There are seem to be dragons somewhere...

On 15.02.2022 22:29, Chen, Alvin W wrote:

Hi Guys,
Any updates to support Intel P-core + E-core?
I have filed a bug: PR 261169
 , but no updates.
Does anybody know the progress?

For Intel Adler Lake P core + E core processor (i7-12700T), copying
files to FAT32 partition, the file corrupted (50%), but ZFS is fine.
After disabling E core in the code by restrict the max cpu number, this
issue is gone. And No E core processor has no such issue, like i7-12400.

HW ENV:
CPU: Intel AlderLake 12th Gen i7-12700T
Disk: NVME SSD

There are 3 methods to reproduce this issue:
1. Make FreeBSD 13 USB disk installer, install FreeBSD with UFS, and
select install source and ports, the txz package checking will be failed.

2. Boot to shell by USB disk installer, and mount a FAT32 partition (on
SSD), and copy a 300MB file to the FAT32, compare the sha256 checksums
for the source file and the dst file, the checksum are different (50%).
Or if there is a 300MB file in FAT32 partition, mount the partition, and
for the first time check the sha256 value by running 'sha256 file.tgz',
the checksum is wrong, but the second time, the checksum is correct.

3. Install FreeBSD 13 with ZFS, and it can work well. And boot into
FreeBSD, disable swap, and format the SWAP partition to FAT32. Do the
testing as above.

Regards,

Alvin Chen

Dell | Comercial Client Group

office +86-10-82862506, fax +86-10-82861554, Dell Lync 8672506
weike_c...@dell.com 

Internal Use - Confidential



--
Alexander Motin




--
青木 知明  [Tomoaki AOKI]


--
Alexander Motin



Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2022-02-19 Thread Mike Karels


On 18 Feb 2022, at 20:55, Tomoaki AOKI wrote:

> Just a thought, but can it be the reason with timing (e.g., rendezvous
> within (i)threads, hardware controlls without using hardware timer)
> problem?
>
> On FreeBSD, IIUC, multi processor (multi core) implementation assumes
> SMP (differs only clock speed) and end up with difference of
> performance at same clock speed within P-core and E-core, possibly.

Another possibility is that the system is confused by having hyperthreading
on the P cores but not the E cores.

> BTW, how aarch64 guys implement big.Little support to avoid such a case?
> Or are they simply disable all Little cores and use big only?

Are there supported arm64 systems with asymmetric processors yet?

Mike

> On Fri, 18 Feb 2022 15:36:27 -0500
> Alexander Motin  wrote:
>
>> This looks pretty weird to me, but I don't think it is specific to the
>> FAT32.  Just today I've first noticed that booting TrueNAS 12.0-U8
>> (http://download.freenas.org/12.0/STABLE/U8/x64/TrueNAS-12.0-U8.iso)
>> (based on FreeBSD 12.2 with many backports) from NVMe SSD (I don't
>> insist on NVMe so far) and ZFS almost never completes successfully,
>> ending up in hangs or random stack corruption panics in ZFS threads as
>> soon as at least one E core is enabled of my i7-12700K.  Disabling all E
>> cores fixes the problem.  Updated to TrueNAS 13.0-BETA1 (based on
>> FreeBSD 13.0-STABLE from few weeks ago) it does not demonstrate the
>> problem any more.  The same TrueNAS 12.0-U8 kernel booted from NFS does
>> not seem to demonstrate the problem with ZFS mounting, but I haven't
>> stressed it much so far.
>>
>> There are seem to be dragons somewhere...
>>
>> On 15.02.2022 22:29, Chen, Alvin W wrote:
>>> Hi Guys,
>>> Any updates to support Intel P-core + E-core?
>>> I have filed a bug: PR 261169
>>>  , but no updates.
>>> Does anybody know the progress?
>>>
>>> For Intel Adler Lake P core + E core processor (i7-12700T), copying
>>> files to FAT32 partition, the file corrupted (50%), but ZFS is fine.
>>> After disabling E core in the code by restrict the max cpu number, this
>>> issue is gone. And No E core processor has no such issue, like i7-12400.
>>>
>>> HW ENV:
>>> CPU: Intel AlderLake 12th Gen i7-12700T
>>> Disk: NVME SSD
>>>
>>> There are 3 methods to reproduce this issue:
>>> 1. Make FreeBSD 13 USB disk installer, install FreeBSD with UFS, and
>>> select install source and ports, the txz package checking will be failed.
>>>
>>> 2. Boot to shell by USB disk installer, and mount a FAT32 partition (on
>>> SSD), and copy a 300MB file to the FAT32, compare the sha256 checksums
>>> for the source file and the dst file, the checksum are different (50%).
>>> Or if there is a 300MB file in FAT32 partition, mount the partition, and
>>> for the first time check the sha256 value by running 'sha256 file.tgz',
>>> the checksum is wrong, but the second time, the checksum is correct.
>>>
>>> 3. Install FreeBSD 13 with ZFS, and it can work well. And boot into
>>> FreeBSD, disable swap, and format the SWAP partition to FAT32. Do the
>>> testing as above.
>>>
>>> Regards,
>>>
>>> Alvin Chen
>>>
>>> Dell | Comercial Client Group
>>>
>>> office +86-10-82862506, fax +86-10-82861554, Dell Lync 8672506
>>> weike_c...@dell.com 
>>>
>>> Internal Use - Confidential
>>>
>>
>> -- 
>> Alexander Motin
>>
>
>
> -- 
> 青木 知明  [Tomoaki AOKI]