Re: 14.0 boot failure

2023-08-08 Thread Matthias Apitz



A kernel git cloned on August 6 boots fine.

--
Matthias Apitz
E-mail: g...@unixarea.de
WWW: http://www.unixarea.de/
phone: +49-170-4527211

Am 08.08.2023 19:39, schrieb Graham Perrin:

On 05/08/2023 00:45, Kevin Oberman wrote:
A new kernel built from sources pulled today (4-Aug) at 5:26 UTC fails 
to boot. …


I refrained from updating after reading this.

Any news?

TIA




Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2023-08-08 Thread Michael Butler

On 8/8/23 10:56, Tomoaki AOKI wrote:

On Tue, 8 Aug 2023 17:02:32 +0300
Konstantin Belousov  wrote:


 [ .. snip .. ]


The workaround is switched on automatically, when kernel detects 'small cores'
reported by CPUID.


If I read the code correctly, vm.pmap.pcid_invlpg_workaround
(precicely, the corresponding variable) is set to non-zero when the
workaround is enabled. Not sure it was detected correctly at the
original reporter's environment, but forcibly setting the tunable to 1
didn't reported to help sufficiently.
Currently, only setting tunable vm.pmap.pcid_enabled to 0 could help.


I'm seeing similar stability problems on an N95-based device. This too 
is an Alderlake-N device with only E-cores although I'm running it with 
a compilation with CPUTYPE=tremont .. from an older, verbose start-up ..


PPIM 0: PA=0x40, VA=0x8271, size=0x1d5000, mode=0x1
pmap: large map 8 PML4 slots (4096 GB)
VT(efifb): resolution 800x600
Preloaded elf kernel "/boot/kernel.new/kernel" at 0x8234e000.
Preloaded boot_entropy_cache "/boot/entropy" at 0x82357d08.
Preloaded cpu_microcode "/boot/firmware/intel-ucode.bin" at 
0x82357d60.

Preloaded hostuuid "/etc/hostid" at 0x82357dc0.
Preloaded TSLOG data "TSLOG" at 0x82357e10.
CPU: Intel(R) N95 (1689.60-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0xb06e0  Family=0x6  Model=0xbe  Stepping=0

Features=0xbfebfbff

Features2=0x7ffafbbf
  AMD Features=0x2c100800
  AMD Features2=0x121
  Structured Extended 
Features=0x239ca7eb
  Structured Extended 
Features2=0x98c007bc
  Structured Extended 
Features3=0xfc184410

  XSAVE Features=0xf
  IA32_ARCH_CAPS=0x180fd6b
  VT-x: Basic Features=0x3da0500
Pin-Based Controls=0xff
Primary Processor 
Controls=0xfffbfffe
Secondary Processor 
Controls=0x75d7fff

Exit Controls=0x3da0500
Entry Controls=0x3da0500
EPT Features=0x6f34141
VPID Features=0xf01
  TSC: P-state invariant, performance statistics
64-Byte prefetching
L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line
real memory  = 17179869184 (16384 MB)
Physical memory chunk(s):
0x0001 - 0x0009dfff, 581632 bytes (142 pages)
0x0009f000 - 0x0009, 4096 bytes (1 pages)
0x0010 - 0x5fff, 1609564160 bytes (392960 pages)
0x62401000 - 0x7264dfff, 270848000 bytes (66125 pages)
0x75fff000 - 0x75ff, 4096 bytes (1 pages)
0x00011000 - 0x000462497fff, 14533881856 bytes (3548311 pages)
0x00047fa0 - 0x00047fb68fff, 1478656 bytes (361 pages)
avail memory = 16363008000 (15604 MB)
CPU microcode: updated from 0xc to 0x10
MADT: Found CPU APIC ID 0 ACPI ID 0: enabled
SMP: Added CPU 0 (AP)
MADT: Found CPU APIC ID 2 ACPI ID 1: enabled
SMP: Added CPU 2 (AP)
MADT: Found CPU APIC ID 4 ACPI ID 2: enabled
SMP: Added CPU 4 (AP)
MADT: Found CPU APIC ID 6 ACPI ID 3: enabled
SMP: Added CPU 6 (AP)

On start-up, vm.pmap.pcid_invlpg_workaround=1 but seemingly random 
faults still occurred under load, for example, 'make buildworld'. 
Apparent misreads of source-files resulting in syntax errors were the 
most common symptom. Compilation reattempts (mostly) succeed.


Initially, I put this down to an inadequate power-supply but setting 
vm.pmap.pcid_enabled=0 seems to have stabilised it.


I guess there's another dragon in there .. :-(

Michael







Re: 14.0 boot failure

2023-08-08 Thread Graham Perrin

On 05/08/2023 00:45, Kevin Oberman wrote:
A new kernel built from sources pulled today (4-Aug) at 5:26 UTC fails 
to boot. …


I refrained from updating after reading this.

Any news?

TIA



OpenPGP_signature
Description: OpenPGP digital signature


Re: ZFS deadlock in 14

2023-08-08 Thread Graham Perrin

 maybe.


OpenPGP_signature
Description: OpenPGP digital signature


Re: ZFS deadlock in 14

2023-08-08 Thread Dag-Erling Smørgrav
Alan Somers  writes:
> Do you have ZFS block cloning enabled on your pool?  There were a lot
> of bugs associated with that feature.  I think that was merged on
> 3-April.

No, and this deadlock did not appear until May.

DES
-- 
Dag-Erling Smørgrav - d...@freebsd.org



Re: ZFS deadlock in 14

2023-08-08 Thread Alan Somers
On Tue, Aug 8, 2023 at 10:08 AM Dag-Erling Smørgrav  wrote:
>
> At some point between 42d088299c (4 May) and f0c9703301 (26 June), a
> deadlock was introduced in ZFS.  It is still present as of 9c2823bae9 (4
> August) and is 100% reproducable just by starting poudriere bulk in a
> 16-core VM and waiting a few hours until deadlkres kicks in.  In the
> latest instance, deadlkres complained about a bash process:

Do you have ZFS block cloning enabled on your pool?  There were a lot
of bugs associated with that feature.  I think that was merged on
3-April.

> zpool get feature@block_cloning zroot
NAME   PROPERTY   VALUE  SOURCE
zroot  feature@block_cloning  disabled   local



ZFS deadlock in 14

2023-08-08 Thread Dag-Erling Smørgrav
At some point between 42d088299c (4 May) and f0c9703301 (26 June), a
deadlock was introduced in ZFS.  It is still present as of 9c2823bae9 (4
August) and is 100% reproducable just by starting poudriere bulk in a
16-core VM and waiting a few hours until deadlkres kicks in.  In the
latest instance, deadlkres complained about a bash process:

#0  sched_switch (td=td@entry=0xfe02fb1d8000, flags=flags@entry=259) at 
/usr/src/sys/kern/sched_ule.c:2299
#1  0x80b5a0a3 in mi_switch (flags=flags@entry=259) at 
/usr/src/sys/kern/kern_synch.c:550
#2  0x80babcb4 in sleepq_switch (wchan=0xf818543a9e70, pri=64) 
at /usr/src/sys/kern/subr_sleepqueue.c:609
#3  0x80babb8c in sleepq_wait (wchan=, 
pri=) at /usr/src/sys/kern/subr_sleepqueue.c:660
#4  0x80b1c1b0 in sleeplk (lk=lk@entry=0xf818543a9e70, 
flags=flags@entry=2121728, ilk=ilk@entry=0x0, 
wmesg=wmesg@entry=0x8222a054 "zfs", pri=, pri@entry=64, 
timo=timo@entry=6, queue=1) at /usr/src/sys/kern/kern_lock.c:310
#5  0x80b1a23f in lockmgr_slock_hard (lk=0xf818543a9e70, 
flags=2121728, ilk=, file=0x812544fb 
"/usr/src/sys/kern/vfs_subr.c", line=3057, lwa=0x0) at 
/usr/src/sys/kern/kern_lock.c:705
#6  0x80c59ec3 in VOP_LOCK1 (vp=0xf818543a9e00, flags=2105344, 
file=0x812544fb "/usr/src/sys/kern/vfs_subr.c", line=3057) at 
./vnode_if.h:1120
#7  _vn_lock (vp=vp@entry=0xf818543a9e00, flags=2105344, 
file=, line=, line@entry=3057) at 
/usr/src/sys/kern/vfs_vnops.c:1815
#8  0x80c4173d in vget_finish (vp=0xf818543a9e00, 
flags=, vs=vs@entry=VGET_USECOUNT) at 
/usr/src/sys/kern/vfs_subr.c:3057
#9  0x80c1c9b7 in cache_lookup (dvp=dvp@entry=0xf802cd02ac40, 
vpp=vpp@entry=0xfe046b20ac30, cnp=cnp@entry=0xfe046b20ac58, 
tsp=tsp@entry=0x0, ticksp=ticksp@entry=0x0) at 
/usr/src/sys/kern/vfs_cache.c:2086
#10 0x80c2150c in vfs_cache_lookup (ap=) at 
/usr/src/sys/kern/vfs_cache.c:3068
#11 0x80c32c37 in VOP_LOOKUP (dvp=0xf802cd02ac40, 
vpp=0xfe046b20ac30, cnp=0xfe046b20ac58) at ./vnode_if.h:69
#12 vfs_lookup (ndp=ndp@entry=0xfe046b20abd8) at 
/usr/src/sys/kern/vfs_lookup.c:1266
#13 0x80c31ce1 in namei (ndp=ndp@entry=0xfe046b20abd8) at 
/usr/src/sys/kern/vfs_lookup.c:689
#14 0x80c52090 in kern_statat (td=0xfe02fb1d8000, 
flag=, fd=-100, path=0xa75b480e070 , pathseg=pathseg@entry=UIO_USERSPACE, 
sbp=sbp@entry=0xfe046b20ad18)
at /usr/src/sys/kern/vfs_syscalls.c:2441
#15 0x80c52797 in sys_fstatat (td=, 
uap=0xfe02fb1d8400) at /usr/src/sys/kern/vfs_syscalls.c:2419
#16 0x81049398 in syscallenter (td=) at 
/usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:190
#17 amd64_syscall (td=0xfe02fb1d8000, traced=0) at 
/usr/src/sys/amd64/amd64/trap.c:1199
#18 

The lock it is trying to acquire in frame 5 belongs to another bash
process which is in the process of creating a fifo:

#0  sched_switch (td=td@entry=0xfe046acd8e40, flags=flags@entry=259) at 
/usr/src/sys/kern/sched_ule.c:2299
#1  0x80b5a0a3 in mi_switch (flags=flags@entry=259) at 
/usr/src/sys/kern/kern_synch.c:550
#2  0x80babcb4 in sleepq_switch (wchan=0xf8018acbf154, pri=87) 
at /usr/src/sys/kern/subr_sleepqueue.c:609
#3  0x80babb8c in sleepq_wait (wchan=, 
pri=) at /usr/src/sys/kern/subr_sleepqueue.c:660
#4  0x80b59606 in _sleep (ident=ident@entry=0xf8018acbf154, 
lock=lock@entry=0xf8018acbf120, priority=priority@entry=87, 
wmesg=0x8223af0e "zfs teardown inactive", sbt=sbt@entry=0, 
pr=pr@entry=0, flags=256)
at /usr/src/sys/kern/kern_synch.c:225
#5  0x80b45dc0 in rms_rlock_fallback (rms=0xf8018acbf120) at 
/usr/src/sys/kern/kern_rmlock.c:1015
#6  0x80b45c93 in rms_rlock (rms=, 
rms@entry=0xf8018acbf120) at /usr/src/sys/kern/kern_rmlock.c:1036
#7  0x81fb147b in zfs_freebsd_reclaim (ap=) at 
/usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c:5164
#8  0x8111d245 in VOP_RECLAIM_APV (vop=0x822e71a0 
, a=a@entry=0xfe0410f1c9c8) at vnode_if.c:2180
#9  0x80c43569 in VOP_RECLAIM (vp=0xf802cdbaca80) at 
./vnode_if.h:1084
#10 vgonel (vp=vp@entry=0xf802cdbaca80) at 
/usr/src/sys/kern/vfs_subr.c:4143
#11 0x80c3ef61 in vtryrecycle (vp=0xf802cdbaca80) at 
/usr/src/sys/kern/vfs_subr.c:1693
#12 vnlru_free_impl (count=count@entry=1, mnt_op=mnt_op@entry=0x0, 
mvp=0xf8010864da00) at /usr/src/sys/kern/vfs_subr.c:1344
#13 0x80c49553 in vnlru_free_locked (count=1) at 
/usr/src/sys/kern/vfs_subr.c:1357
#14 vn_alloc_hard (mp=mp@entry=0x0) at /usr/src/sys/kern/vfs_subr.c:1744
#15 0x80c3f6f0 in vn_alloc (mp=0x0) at 
/usr/src/sys/amd64/include/atomic.h:375
#16 getnewvnode_reserve () at /usr/src/sys/kern/vfs_subr.c:1888
#17 

[SOLVED] 14-CURRENT | alternatives for defunct /usr/lib/pam_opie.so?

2023-08-08 Thread Michael Grimm
Dag-Erling Smørgrav  wrote:
> Michael Grimm  writes:

>> I'm currently in the process to prepare for upcoming 14-STABLE. Thus,
>> I upgraded one of my sytems from 13-STABLE to 14-CURRENT.
> 
> Did you run etcupdate?

I was just about to report that I somehow managed to "solve" this issue with 
pam_opie, but not knowing how.

Yep, I did run etcupdate, but not before reporting my issue. This morning I did 
reinstall world, kernel and all my ports and ran etcupdate on host and in all 
my ports. Removing security/opie afterwards and everything is fine now.

Sorry for the noise, I should have known better!

Regards and thanks to all,
Michael


Re: 14-CURRENT | alternatives for defunct /usr/lib/pam_opie.so?

2023-08-08 Thread Dag-Erling Smørgrav
Michael Grimm  writes:
> I'm currently in the process to prepare for upcoming 14-STABLE. Thus,
> I upgraded one of my sytems from 13-STABLE to 14-CURRENT.

Did you run etcupdate?

DES
-- 
Dag-Erling Smørgrav - d...@freebsd.org



Re: sys/modules/Makefile and MACHINE_ARCH vs arm64 (in use) vs aarch64 (not in use) VS. man arch; also COMPAT_FREEBSD32_ENABLED use

2023-08-08 Thread Mark Millard
On Aug 2, 2023, at 17:25, Mark Millard  wrote:

> On Aug 2, 2023, at 12:56, Mark Millard  wrote:
> 
>> On Aug 2, 2023, at 11:16, Warner Losh  wrote:
>> 
>>> Those all look wrong to me.
>>> 
>>> Warner 
>>> 
>>> On Wed, Aug 2, 2023, 11:27 AM Mark Millard  wrote:
>>> man arch reports:
>>> 
>>>  MACHINE   MACHINE_CPUARCH   MACHINE_ARCH
>>>  arm64 aarch64   aarch64
>>> . . .
>>>  arm   arm   armv6, armv7
>>> 
>>> So I'd not expect arm64 in MACHINE_ARCH . But
>>> sys/modules/Makefile has (from a grep for MACHINE_ARCH):
>>> 
>>> .if ${MACHINE_ARCH} == "amd64" || ${MACHINE_ARCH} == "arm64"
>>> .if ${MACHINE_ARCH} == "amd64" || ${MACHINE_ARCH} == "arm64" || 
>>> ${MACHINE_ARCH:Mpowerpc64*}
>>> 
>>> 
>>> Another issue may be that COMPAT_FREEBSD32_ENABLED is only
>>> put to use in the Makefile for MACHINE_CPUARCH being i386
>>> or amd64 :
>>> 
>>> .if ${MACHINE_CPUARCH} == "i386" || ${MACHINE_CPUARCH} == "amd64"
>>> _agp=   agp
>>> .if ${MACHINE_CPUARCH} == "i386" || !empty(COMPAT_FREEBSD32_ENABLED)
>>> . . .
>> 
>> 
>> I'll note that, for example, i386 vs. armv7 do not match
>> for some struct md_ioctl field offsets and the overall
>> size.
> 
> Turns out no member offsets were different but the size
> was: just differing tail padding in the structure. Still
> it means some conditional differences across i386 and
> armv7. (I've no clue if the 32-bit powerpc lib32/chroot
> handling is working on powerpc64 vs. not. So I make no
> claims relative to such.)

See:

https://lists.freebsd.org/archives/dev-commits-src-main/2023-August/017561.html
(git: 58a46cfd751a - main - md driver compat32: fix structure padding for arm, 
powerpc)

for Mike Karels' fix for main for that "turns out". It avoids one kind
of kyua run problem for armv7 on aarch64, since mdconfig is used for
some of the tests.

(The ${MACHINE_ARCH} == "arm64" and COMPAT_FREEBSD32_ENABLED use
are still as they were in sys/modules/Makefile . This note is not
about that issue.)

>> Mike Karels is looking at getting struct md_ioctl32
>> correctly matching each of of the contexts: i386, (32-bit)
>> powerpc, and armv7.
>> 
>> I do not know if there are other COMPAT_FREEBSD32 adjustments
>> needed for differences in memory layout across the 3 (i386,
>> powerpc, armv7). md_ioctl I learned about via kyua test runs
>> and looking at the background for some things it reported for
>> armv7.
>> 
>> I've not found a clear indication of what is expected to work
>> for chroot/lib32 vs. what is not expected to work. It seems
>> one must look in the code and see if one finds conditional
>> material based, in part, on COMPAT_FREEBSD32. It might also
>> be that COMPAT_FREEBSD32 for i386 vs. armv7 vs. powerpc
>> might not be intending identical coverage for all I know.
>> So seeing COMPAT_FREEBSD32 might not be enough to know the
>> intent.




===
Mark Millard
marklmi at yahoo.com




Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2023-08-08 Thread Tomoaki AOKI
On Tue, 8 Aug 2023 17:02:32 +0300
Konstantin Belousov  wrote:

> On Tue, Aug 08, 2023 at 10:46:12PM +0900, Tomoaki AOKI wrote:
> > On Tue, 8 Aug 2023 15:38:46 +0300
> > Konstantin Belousov  wrote:
> > 
> > > On Tue, Aug 08, 2023 at 06:37:35AM +0900, Tomoaki AOKI wrote:
> > > > On Sun, 6 Aug 2023 12:55:07 +0300
> > > > Konstantin Belousov  wrote:
> > > > 
> > > > > On Sun, Aug 06, 2023 at 06:12:38PM +0900, Tomoaki AOKI wrote:
> > > > > > On Wed, 23 Feb 2022 01:30:28 +0200
> > > > > > Konstantin Belousov  wrote:
> > > > > > 
> > > > > > > On Tue, Feb 22, 2022 at 06:23:17PM -0500, Alexander Motin wrote:
> > > > > > > > On 22.02.2022 17:46, Konstantin Belousov wrote:
> > > > > > > > > Ok, the next step is to get the CPU feature reports from P- 
> > > > > > > > > vs. E- cores.
> > > > > > > > > Patch below should work, with verbose boot.
> > > > > > > > 
> > > > > > > > Not much difference on that level:
> > > > > > > > 
> > > > > > > > --- zzzp2022-02-22 18:18:24.531704000 -0500
> > > > > > > > +++ zzze2022-02-22 18:18:18.631236000 -0500
> > > > > > > > @@ -1,22 +1,21 @@
> > > > > > > > -CPU 2: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz 
> > > > > > > > K8-class CPU)
> > > > > > > > +CPU 16: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz 
> > > > > > > > K8-class CPU)
> > > > > > > >Origin="GenuineIntel"  Id=0x90672  Family=0x6  Model=0x97  
> > > > > > > > Stepping=2
> > > > > > > > Features=0xbfebfbff
> > > > > > > > Features2=0x7ffafbff
> > > > > > > >AMD Features=0x2c100800
> > > > > > > >AMD Features2=0x121
> > > > > > > >Structured Extended 
> > > > > > > > Features=0x239ca7eb
> > > > > > > >Structured Extended 
> > > > > > > > Features2=0x98c027ac
> > > > > > > >Structured Extended 
> > > > > > > > Features3=0xfc1cc410
> > > > > > > >XSAVE Features=0xf
> > > > > > > >
> > > > > > > > IA32_ARCH_CAPS=0xd6b
> > > > > > > >VT-x: Basic Features=0x3da0500
> > > > > > > >  Pin-Based 
> > > > > > > > Controls=0xff
> > > > > > > >  Primary Processor 
> > > > > > > > Controls=0xfffbfffe
> > > > > > > >  Secondary Processor 
> > > > > > > > Controls=0xf5d7fff
> > > > > > > >  Exit Controls=0x3da0500
> > > > > > > >  Entry Controls=0x3da0500
> > > > > > > >  EPT 
> > > > > > > > Features=0x6f34141
> > > > > > > >  VPID 
> > > > > > > > Features=0x10f01
> > > > > > > >TSC: P-state invariant, performance statistics
> > > > > > > > -64-Byte prefetching
> > > > > > > > -L2 cache: 1280 kbytes, 8-way associative, 64 bytes/line
> > > > > > > > +L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line
> > > > > > > > 
> > > > > > > 
> > > > > > > Show me the full verbose dmesg of the boot then.
> > > > > > > 
> > > > > > > As another blind guess, try to disable pcid, 
> > > > > > > vm.pmap.pcid_enabled=0.
> > > > > > > 
> > > > > > 
> > > > > > Hi.
> > > > > > 
> > > > > > Intel N100 is reported to crash without this tunable on 13.2 at
> > > > > > freebsd-users-jp ML (as this is a ML in Japanese, reported in
> > > > > > Japanese). [1]
> > > > > > Crashes with UFS, but ZFS is claimed to be OK.
> > > > > > 
> > > > > > N100 is an Alder Lake-N processor WITHOUT P-CORE. [2] [3]
> > > > > > So check logics on workarouund codes (IIRC, all are MFC'ed before 
> > > > > > 13.2)
> > > > > > wouldn't be working?
> > > > > 
> > > > > Show me the output from x86info -r on the machine, I do not care which
> > > > > specific core it is, they should be all the same.  x86info is 
> > > > > available
> > > > > as sysutils/x86info.
> > > > 
> > > > Requested to original reporter and got the result below.
> > > > HTH.
> > > > 
> > > > ---
> > > > root@eq12:~ # x86info -r
> > > > x86info v1.31pre
> > > > /dev/cpuctl0: No such file or directory
> > > > Found 4 identical CPUs
> > > > Extended Family: 0 Extended Model: 11 Family: 6 Model: 190 Stepping: 0
> > > > Type: 0 (Original OEM)
> > > > CPU Model (x86info's best guess): Unknown model.
> > > ...
> > > > eax in: 0x001a, eax = 2001 ebx =  ecx =  edx = 
> > > > 
> > > 
> > > The CPU is reported as small core/atom, so the workaround is turned on.
> > > I do not think that the issue reported is related to the TLB/PG_G errata.
> > > 
> > > Why do you think that this is hw issue at all, and not some software bug
> > > in the build etc ?
> > 
> > Because the issue looks similar (crashes on UFS but not ZFS, and as far
> > as the original reporter tested, vm.pmap.pcid_enabled=0
> > in /boot/loader.conf helped).
> > 
> > Moreover, N100 CPU is Alder Lake-N. So potentially includes the same
> > design issue (common circuits, firmwares, ...).
> > 
> > So I suspected the same problem persists even without P-core and
> > adviced the original reporter to add the workaround
> > in /boot/loader.conf.
> > It seems to help until now.
> The workaround is switched on automatically, when kernel detects 'small cores'
> reported by CPUID.

If I 

Re: Has the update procedure changed?

2023-08-08 Thread Dag-Erling Smørgrav
Matthias Apitz  writes:
> I know the reason (the install process uses the old existing kldxref which 
> does
> not can handle some things). I proceeded with the installation and all
> is fine, the box is up again in multiuser and /usr/sbin/kldxref is now
> from today. Should I run 'make installkernel' a 2nd time?

'make reinstallkernel' is more appropriate in this case.

DES
-- 
Dag-Erling Smørgrav - d...@freebsd.org



Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2023-08-08 Thread Konstantin Belousov
On Tue, Aug 08, 2023 at 10:46:12PM +0900, Tomoaki AOKI wrote:
> On Tue, 8 Aug 2023 15:38:46 +0300
> Konstantin Belousov  wrote:
> 
> > On Tue, Aug 08, 2023 at 06:37:35AM +0900, Tomoaki AOKI wrote:
> > > On Sun, 6 Aug 2023 12:55:07 +0300
> > > Konstantin Belousov  wrote:
> > > 
> > > > On Sun, Aug 06, 2023 at 06:12:38PM +0900, Tomoaki AOKI wrote:
> > > > > On Wed, 23 Feb 2022 01:30:28 +0200
> > > > > Konstantin Belousov  wrote:
> > > > > 
> > > > > > On Tue, Feb 22, 2022 at 06:23:17PM -0500, Alexander Motin wrote:
> > > > > > > On 22.02.2022 17:46, Konstantin Belousov wrote:
> > > > > > > > Ok, the next step is to get the CPU feature reports from P- vs. 
> > > > > > > > E- cores.
> > > > > > > > Patch below should work, with verbose boot.
> > > > > > > 
> > > > > > > Not much difference on that level:
> > > > > > > 
> > > > > > > --- zzzp2022-02-22 18:18:24.531704000 -0500
> > > > > > > +++ zzze2022-02-22 18:18:18.631236000 -0500
> > > > > > > @@ -1,22 +1,21 @@
> > > > > > > -CPU 2: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz 
> > > > > > > K8-class CPU)
> > > > > > > +CPU 16: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz 
> > > > > > > K8-class CPU)
> > > > > > >Origin="GenuineIntel"  Id=0x90672  Family=0x6  Model=0x97  
> > > > > > > Stepping=2
> > > > > > > Features=0xbfebfbff
> > > > > > > Features2=0x7ffafbff
> > > > > > >AMD Features=0x2c100800
> > > > > > >AMD Features2=0x121
> > > > > > >Structured Extended 
> > > > > > > Features=0x239ca7eb
> > > > > > >Structured Extended 
> > > > > > > Features2=0x98c027ac
> > > > > > >Structured Extended 
> > > > > > > Features3=0xfc1cc410
> > > > > > >XSAVE Features=0xf
> > > > > > >
> > > > > > > IA32_ARCH_CAPS=0xd6b
> > > > > > >VT-x: Basic Features=0x3da0500
> > > > > > >  Pin-Based Controls=0xff
> > > > > > >  Primary Processor 
> > > > > > > Controls=0xfffbfffe
> > > > > > >  Secondary Processor 
> > > > > > > Controls=0xf5d7fff
> > > > > > >  Exit Controls=0x3da0500
> > > > > > >  Entry Controls=0x3da0500
> > > > > > >  EPT 
> > > > > > > Features=0x6f34141
> > > > > > >  VPID 
> > > > > > > Features=0x10f01
> > > > > > >TSC: P-state invariant, performance statistics
> > > > > > > -64-Byte prefetching
> > > > > > > -L2 cache: 1280 kbytes, 8-way associative, 64 bytes/line
> > > > > > > +L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line
> > > > > > > 
> > > > > > 
> > > > > > Show me the full verbose dmesg of the boot then.
> > > > > > 
> > > > > > As another blind guess, try to disable pcid, vm.pmap.pcid_enabled=0.
> > > > > > 
> > > > > 
> > > > > Hi.
> > > > > 
> > > > > Intel N100 is reported to crash without this tunable on 13.2 at
> > > > > freebsd-users-jp ML (as this is a ML in Japanese, reported in
> > > > > Japanese). [1]
> > > > > Crashes with UFS, but ZFS is claimed to be OK.
> > > > > 
> > > > > N100 is an Alder Lake-N processor WITHOUT P-CORE. [2] [3]
> > > > > So check logics on workarouund codes (IIRC, all are MFC'ed before 
> > > > > 13.2)
> > > > > wouldn't be working?
> > > > 
> > > > Show me the output from x86info -r on the machine, I do not care which
> > > > specific core it is, they should be all the same.  x86info is available
> > > > as sysutils/x86info.
> > > 
> > > Requested to original reporter and got the result below.
> > > HTH.
> > > 
> > > ---
> > > root@eq12:~ # x86info -r
> > > x86info v1.31pre
> > > /dev/cpuctl0: No such file or directory
> > > Found 4 identical CPUs
> > > Extended Family: 0 Extended Model: 11 Family: 6 Model: 190 Stepping: 0
> > > Type: 0 (Original OEM)
> > > CPU Model (x86info's best guess): Unknown model.
> > ...
> > > eax in: 0x001a, eax = 2001 ebx =  ecx =  edx = 
> > > 
> > 
> > The CPU is reported as small core/atom, so the workaround is turned on.
> > I do not think that the issue reported is related to the TLB/PG_G errata.
> > 
> > Why do you think that this is hw issue at all, and not some software bug
> > in the build etc ?
> 
> Because the issue looks similar (crashes on UFS but not ZFS, and as far
> as the original reporter tested, vm.pmap.pcid_enabled=0
> in /boot/loader.conf helped).
> 
> Moreover, N100 CPU is Alder Lake-N. So potentially includes the same
> design issue (common circuits, firmwares, ...).
> 
> So I suspected the same problem persists even without P-core and
> adviced the original reporter to add the workaround
> in /boot/loader.conf.
> It seems to help until now.
The workaround is switched on automatically, when kernel detects 'small cores'
reported by CPUID.



Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2023-08-08 Thread Tomoaki AOKI
On Tue, 8 Aug 2023 15:38:46 +0300
Konstantin Belousov  wrote:

> On Tue, Aug 08, 2023 at 06:37:35AM +0900, Tomoaki AOKI wrote:
> > On Sun, 6 Aug 2023 12:55:07 +0300
> > Konstantin Belousov  wrote:
> > 
> > > On Sun, Aug 06, 2023 at 06:12:38PM +0900, Tomoaki AOKI wrote:
> > > > On Wed, 23 Feb 2022 01:30:28 +0200
> > > > Konstantin Belousov  wrote:
> > > > 
> > > > > On Tue, Feb 22, 2022 at 06:23:17PM -0500, Alexander Motin wrote:
> > > > > > On 22.02.2022 17:46, Konstantin Belousov wrote:
> > > > > > > Ok, the next step is to get the CPU feature reports from P- vs. 
> > > > > > > E- cores.
> > > > > > > Patch below should work, with verbose boot.
> > > > > > 
> > > > > > Not much difference on that level:
> > > > > > 
> > > > > > --- zzzp2022-02-22 18:18:24.531704000 -0500
> > > > > > +++ zzze2022-02-22 18:18:18.631236000 -0500
> > > > > > @@ -1,22 +1,21 @@
> > > > > > -CPU 2: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz K8-class 
> > > > > > CPU)
> > > > > > +CPU 16: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz K8-class 
> > > > > > CPU)
> > > > > >Origin="GenuineIntel"  Id=0x90672  Family=0x6  Model=0x97  
> > > > > > Stepping=2
> > > > > > Features=0xbfebfbff
> > > > > > Features2=0x7ffafbff
> > > > > >AMD Features=0x2c100800
> > > > > >AMD Features2=0x121
> > > > > >Structured Extended 
> > > > > > Features=0x239ca7eb
> > > > > >Structured Extended 
> > > > > > Features2=0x98c027ac
> > > > > >Structured Extended 
> > > > > > Features3=0xfc1cc410
> > > > > >XSAVE Features=0xf
> > > > > >
> > > > > > IA32_ARCH_CAPS=0xd6b
> > > > > >VT-x: Basic Features=0x3da0500
> > > > > >  Pin-Based Controls=0xff
> > > > > >  Primary Processor 
> > > > > > Controls=0xfffbfffe
> > > > > >  Secondary Processor 
> > > > > > Controls=0xf5d7fff
> > > > > >  Exit Controls=0x3da0500
> > > > > >  Entry Controls=0x3da0500
> > > > > >  EPT 
> > > > > > Features=0x6f34141
> > > > > >  VPID 
> > > > > > Features=0x10f01
> > > > > >TSC: P-state invariant, performance statistics
> > > > > > -64-Byte prefetching
> > > > > > -L2 cache: 1280 kbytes, 8-way associative, 64 bytes/line
> > > > > > +L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line
> > > > > > 
> > > > > 
> > > > > Show me the full verbose dmesg of the boot then.
> > > > > 
> > > > > As another blind guess, try to disable pcid, vm.pmap.pcid_enabled=0.
> > > > > 
> > > > 
> > > > Hi.
> > > > 
> > > > Intel N100 is reported to crash without this tunable on 13.2 at
> > > > freebsd-users-jp ML (as this is a ML in Japanese, reported in
> > > > Japanese). [1]
> > > > Crashes with UFS, but ZFS is claimed to be OK.
> > > > 
> > > > N100 is an Alder Lake-N processor WITHOUT P-CORE. [2] [3]
> > > > So check logics on workarouund codes (IIRC, all are MFC'ed before 13.2)
> > > > wouldn't be working?
> > > 
> > > Show me the output from x86info -r on the machine, I do not care which
> > > specific core it is, they should be all the same.  x86info is available
> > > as sysutils/x86info.
> > 
> > Requested to original reporter and got the result below.
> > HTH.
> > 
> > ---
> > root@eq12:~ # x86info -r
> > x86info v1.31pre
> > /dev/cpuctl0: No such file or directory
> > Found 4 identical CPUs
> > Extended Family: 0 Extended Model: 11 Family: 6 Model: 190 Stepping: 0
> > Type: 0 (Original OEM)
> > CPU Model (x86info's best guess): Unknown model.
> ...
> > eax in: 0x001a, eax = 2001 ebx =  ecx =  edx = 
> > 
> 
> The CPU is reported as small core/atom, so the workaround is turned on.
> I do not think that the issue reported is related to the TLB/PG_G errata.
> 
> Why do you think that this is hw issue at all, and not some software bug
> in the build etc ?

Because the issue looks similar (crashes on UFS but not ZFS, and as far
as the original reporter tested, vm.pmap.pcid_enabled=0
in /boot/loader.conf helped).

Moreover, N100 CPU is Alder Lake-N. So potentially includes the same
design issue (common circuits, firmwares, ...).

So I suspected the same problem persists even without P-core and
adviced the original reporter to add the workaround
in /boot/loader.conf.
It seems to help until now.

-- 
Tomoaki AOKI



Re: [Intel AlderLake] Read files to FAT32 or UFS partition cause data corrupt due to P-Core-Core

2023-08-08 Thread Konstantin Belousov
On Tue, Aug 08, 2023 at 06:37:35AM +0900, Tomoaki AOKI wrote:
> On Sun, 6 Aug 2023 12:55:07 +0300
> Konstantin Belousov  wrote:
> 
> > On Sun, Aug 06, 2023 at 06:12:38PM +0900, Tomoaki AOKI wrote:
> > > On Wed, 23 Feb 2022 01:30:28 +0200
> > > Konstantin Belousov  wrote:
> > > 
> > > > On Tue, Feb 22, 2022 at 06:23:17PM -0500, Alexander Motin wrote:
> > > > > On 22.02.2022 17:46, Konstantin Belousov wrote:
> > > > > > Ok, the next step is to get the CPU feature reports from P- vs. E- 
> > > > > > cores.
> > > > > > Patch below should work, with verbose boot.
> > > > > 
> > > > > Not much difference on that level:
> > > > > 
> > > > > --- zzzp2022-02-22 18:18:24.531704000 -0500
> > > > > +++ zzze2022-02-22 18:18:18.631236000 -0500
> > > > > @@ -1,22 +1,21 @@
> > > > > -CPU 2: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz K8-class 
> > > > > CPU)
> > > > > +CPU 16: 12th Gen Intel(R) Core(TM) i7-12700K (3609.60-MHz K8-class 
> > > > > CPU)
> > > > >Origin="GenuineIntel"  Id=0x90672  Family=0x6  Model=0x97  
> > > > > Stepping=2
> > > > > Features=0xbfebfbff
> > > > > Features2=0x7ffafbff
> > > > >AMD Features=0x2c100800
> > > > >AMD Features2=0x121
> > > > >Structured Extended 
> > > > > Features=0x239ca7eb
> > > > >Structured Extended 
> > > > > Features2=0x98c027ac
> > > > >Structured Extended 
> > > > > Features3=0xfc1cc410
> > > > >XSAVE Features=0xf
> > > > >IA32_ARCH_CAPS=0xd6b
> > > > >VT-x: Basic Features=0x3da0500
> > > > >  Pin-Based Controls=0xff
> > > > >  Primary Processor 
> > > > > Controls=0xfffbfffe
> > > > >  Secondary Processor 
> > > > > Controls=0xf5d7fff
> > > > >  Exit Controls=0x3da0500
> > > > >  Entry Controls=0x3da0500
> > > > >  EPT 
> > > > > Features=0x6f34141
> > > > >  VPID 
> > > > > Features=0x10f01
> > > > >TSC: P-state invariant, performance statistics
> > > > > -64-Byte prefetching
> > > > > -L2 cache: 1280 kbytes, 8-way associative, 64 bytes/line
> > > > > +L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line
> > > > > 
> > > > 
> > > > Show me the full verbose dmesg of the boot then.
> > > > 
> > > > As another blind guess, try to disable pcid, vm.pmap.pcid_enabled=0.
> > > > 
> > > 
> > > Hi.
> > > 
> > > Intel N100 is reported to crash without this tunable on 13.2 at
> > > freebsd-users-jp ML (as this is a ML in Japanese, reported in
> > > Japanese). [1]
> > > Crashes with UFS, but ZFS is claimed to be OK.
> > > 
> > > N100 is an Alder Lake-N processor WITHOUT P-CORE. [2] [3]
> > > So check logics on workarouund codes (IIRC, all are MFC'ed before 13.2)
> > > wouldn't be working?
> > 
> > Show me the output from x86info -r on the machine, I do not care which
> > specific core it is, they should be all the same.  x86info is available
> > as sysutils/x86info.
> 
> Requested to original reporter and got the result below.
> HTH.
> 
> ---
> root@eq12:~ # x86info -r
> x86info v1.31pre
> /dev/cpuctl0: No such file or directory
> Found 4 identical CPUs
> Extended Family: 0 Extended Model: 11 Family: 6 Model: 190 Stepping: 0
> Type: 0 (Original OEM)
> CPU Model (x86info's best guess): Unknown model.
...
> eax in: 0x001a, eax = 2001 ebx =  ecx =  edx = 
> 

The CPU is reported as small core/atom, so the workaround is turned on.
I do not think that the issue reported is related to the TLB/PG_G errata.

Why do you think that this is hw issue at all, and not some software bug
in the build etc ?



Re: 14-CURRENT | alternatives for defunct /usr/lib/pam_opie.so?

2023-08-08 Thread Ronald Klop

Van: Michael Grimm 
Datum: maandag, 7 augustus 2023 22:43
Aan: freebsd-current@freebsd.org
Onderwerp: 14-CURRENT | alternatives for defunct /usr/lib/pam_opie.so?


Hi,

I'm currently in the process to prepare for upcoming 14-STABLE. Thus, I 
upgraded one of my sytems from 13-STABLE to 14-CURRENT.

Everything went fine, except for programs that need /usr/lib/pam_opie.so which 
are:

1) jexec  /usr/bin/login -u 
2) redis-server
3) mariadb1011-server

Error messages:

su[6371]: in openpam_load_module(): no pam_opie.so found
su[6371]: pam_start: System error

Well, although it has been reported some time ago that pam_opie and 
pam_opieaccess.so will become removed in Freebsd 14, there is a port 
security/opie providing both libraries. Quick workaround.

But I want to understand why the above mentioned programs do fail although not 
dynamically linked against /usr/lib/pam_opie.so

MWN> ldd /usr/bin/login
/usr/bin/login:
libutil.so.9 => /lib/libutil.so.9 (0xd408ecf7000)
libpam.so.6 => /usr/lib/libpam.so.6 (0xd408f6f2000)
libbsm.so.3 => /usr/lib/libbsm.so.3 (0xd4090dab000)
libc.so.7 => /lib/libc.so.7 (0xd408f99d000)
[vdso] (0xd408e18f630)

MWN> ldd /usr/local/bin/redis-server
/usr/local/bin/redis-server:
libthr.so.3 => /lib/libthr.so.3 (0x89a8847f000)
libm.so.5 => /lib/libm.so.5 (0x89a87beb000)
libexecinfo.so.1 => /usr/lib/libexecinfo.so.1 (0x89a891c7000)
libssl.so.30 => /usr/lib/libssl.so.30 (0x89a8a271000)
libcrypto.so.30 => /lib/libcrypto.so.30 (0x89a8b02b000)
libc.so.7 => /lib/libc.so.7 (0x89a8c7fe000)
libelf.so.2 => /lib/libelf.so.2 (0x89a8949b000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x89a8bb85000)
[vdso] (0x89a87323630)

MWN> ldd /usr/local/libexec/mariadbd
/usr/local/libexec/mariadbd:
libpcre2-8.so.0 => /usr/local/lib/libpcre2-8.so.0 (0x145ae576f000)
libwrap.so.6 => /usr/lib/libwrap.so.6 (0x145ae64a5000)
libcrypt.so.5 => /lib/libcrypt.so.5 (0x145ae74be000)
libz.so.6 => /lib/libz.so.6 (0x145ae7d0b000)
libm.so.5 => /lib/libm.so.5 (0x145ae8b3e000)
libexecinfo.so.1 => /usr/lib/libexecinfo.so.1 (0x145ae6e03000)
libssl.so.30 => /usr/lib/libssl.so.30 (0x145ae9575000)
libcrypto.so.30 => /lib/libcrypto.so.30 (0x145aeafff000)
libc++.so.1 => /lib/libc++.so.1 (0x145ae9e3b000)
libcxxrt.so.1 => /lib/libcxxrt.so.1 (0x145aeaa85000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x145aec745000)
libthr.so.3 => /lib/libthr.so.3 (0x145aebf1)
libc.so.7 => /lib/libc.so.7 (0x145aec7fa000)
libelf.so.2 => /lib/libelf.so.2 (0x145aee867000)
[vdso] (0x145ae5010630)

Which alternatives to pam_opie should I investigate?
Reason: I want to get rid of security/opie

Thanks and regards,
Michael

 







Hi,

Might it be possible that pam_opie is still mentioned in a file in /etc/pam.d/* 
on your machine?
An alternative might be 
https://www.freshports.org/security/pam_google_authenticator

See also: 
https://lists.freebsd.org/archives/freebsd-security/2022-September/81.html

Regards,
Ronald.


Re: Has the update procedure changed?

2023-08-08 Thread Graham Perrin

On 06/08/2023 09:20, Mark Millard wrote:


… https://docs.freebsd.org/en/books/handbook/cutting-edge/#makeworld
section 26.6.1 lists:

# git pull /usr/src
check /usr/src/UPDATING
# cd /usr/src
# make -j4 buildworld
# make -j4 kernel
# shutdown -r now
# etcupdate -p
# cd /usr/src
# make installworld
# etcupdate -B
# shutdown -r now

The material in 26.6.5 does not repeat all that, it is
more of a summary that is presented. …


IMHO a summary should not omit any step that is _essential_.

etcupdate -B is non-optional.




OpenPGP_signature
Description: OpenPGP digital signature


RE: 14-CURRENT | alternatives for defunct /usr/lib/pam_opie.so?

2023-08-08 Thread Mark Millard
Michael Grimm  wrote on
Date: Mon, 07 Aug 2023 20:43:22 UTC :

> I'm currently in the process to prepare for upcoming 14-STABLE. Thus, I 
> upgraded one of my sytems from 13-STABLE to 14-CURRENT.
> 
> Everything went fine, except for programs that need /usr/lib/pam_opie.so 
> which are:
> 
> 1) jexec  /usr/bin/login -u 
> 2) redis-server
> 3) mariadb1011-server
> 
> Error messages:
> 
> su[6371]: in openpam_load_module(): no pam_opie.so found
> su[6371]: pam_start: System error
> 
> Well, although it has been reported some time ago that pam_opie and 
> pam_opieaccess.so will become removed in Freebsd 14, there is a port 
> security/opie providing both libraries. Quick workaround.
> 
> But I want to understand why the above mentioned programs do fail although 
> not dynamically linked against /usr/lib/pam_opie.so



openpam_load_module leads to dlopen use to open pam_opie.so
instead of it being prebound : 

# grep -r openpam_load_module /usr/main-src/ | more
/usr/main-src/contrib/openpam/lib/libpam/openpam_impl.h:pam_module_t
*openpam_load_module(const char *)
/usr/main-src/contrib/openpam/lib/libpam/openpam_configure.c:   if 
((this->module = openpam_load_module(modulename)) == NULL) {
/usr/main-src/contrib/openpam/lib/libpam/openpam_load.c:openpam_load_module(const
 char *modulename)

pam_module_t *
openpam_load_module(const char *modulename)
{
pam_module_t *module;

module = openpam_dynamic(modulename);
. . .
return (module);
}

That eventually gets to the likes of:

static void *
try_dlopen(const char *modfn)
{
int check_module_file;
void *dlh;
. . .
if ((dlh = dlopen(modfn, RTLD_NOW)) == NULL) {
openpam_log(PAM_LOG_ERROR, "%s: %s", modfn, dlerror());
errno = 0;
return (NULL);
}
return (dlh);
}

Absent that load working, pam_start also reports a failure because
of the (pam_module_t *)NULL --or so I assume.

===
Mark Millard
marklmi at yahoo.com




Re: Has the update procedure changed?

2023-08-08 Thread Graham Perrin

On 08/08/2023 03:50, Robert Huff wrote:


…

1) It would be really nice is someone would take a look and make
sure these 100% non-contradictory.

…


Looks by people should, ideally, include:


209744 – Move installation instructions from UPDATING to new file


256830 – the COMMON ITEMS: section in /usr/src/UPDATING makes no mention 
of etcupdate


… plus the discussion, which I can't find (I thought it was a BR), about 
disorderly numbering; and so on.




OpenPGP_signature
Description: OpenPGP digital signature