Re: Ryzen 9 (7x000) users: do you experience hangs?

2023-08-04 Thread Stuart Henderson
On 2023-08-03, Mihai Popescu  wrote:
> Not a Ryzen 9, but an AMD.

you seem to be on the wrong thread. this one is *specifically* for ryzen 9,
here is the original message;

| A small number of us with AMD Ryzen 9 (i.e. chips in the 7x000 range)
| machines have been experiencing regular (often daily), or semi-regular
| hangs, but without any obvious cause.
| 
| What we don't know is if we're the unlucky few, or whether this might be a
| wider issue. So, to see if there is some sort of pattern going on (e.g. are
| certain motherboards / BIOSes correlated with hangs or not?), I'd like to
| poll Ryzen 9 OpenBSD users. At a minimum we'd need to know:
| 
|   CPU model (e.g. "7900x")
|   Motherboard (e.g. "MSI PRO670-X")
|   Have you experienced crashes? (Yes/No) If "Yes":
|   what frequency (e.g. "daily/weekly/no obvious pattern")?
|   are there are obvious causes (e.g. "happens when I run program X")?
|   have you found any mitigations (e.g. "updated BIOS")?
|   Ideally a dmesg too
| 
| We're as interested in Ryzen 9 users who aren't experiencing hangs as who
| are! Please feel free to reply to the list, or to me individually, and I'll
| collate the information and see if there are any patterns or not.

anything else is noise on this thread, please report it separately.




Re: Ryzen 9 (7x000) users: do you experience hangs?

2023-08-04 Thread Mihai Popescu
I got the system freeze from the previous email again.
I don't know what "debug" commands to run, please point me in the
right direction.

Thank you.



Re: Ryzen 9 (7x000) users: do you experience hangs?

2023-08-03 Thread Mihai Popescu
Not a Ryzen 9, but an AMD.

I installed the snapshot, and started playing endless-sky from
packages. Went away and we i came back, the game was not responsive
anymore. Switched to console with Alt+Ctrl+F2, logged in as an user
and used top command. endless-sky was listed as sleep / kqread. Logged
out, tried to switch back to X, all went dark and i was not able to
get any console so i did a reboot. It is strange since this game was
running rock solid in previous 7.x snapshots.

What should i run next time, to get some good data?
dmesg output:

( some \^x strange characters removed )
( note the devaadt name - not edited by me )
OpenBSD 7.3-current (RAMDISK_CD) #1251: Wed Aug  2 10:32:58 MDT 2023
deva...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
real mem = 7711522816 (7354MB)
avail mem = 7473790976 (7127MB)
random: good seed from bootblocks
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0pe86df (v4 entrims)
bios0: vendor Hewlett-Packard version "K06 v02.77" date 03/22/2018
bios0: Heulett-Packard HP Compaq Pro 6305 SFF
acpi0 at bios0: ACPI 5.0
acpi0: tables DSDT FACP APIC FPDT MCFG HPET SSDT SLIC MSDM TCPA IVRS
VFCT SSDT SSDT CRAT
acpimadt0 at acpk0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 16 (boot processor)
cpu0: AMD A8-5500B APU with Radeon(tm) HD Graphics, 3194.38 MHz, 15-10-01
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG\^LSVM,EAPICSP,AMCR8,ABM,SSE4A,MAS\^SE,3DNOWP,OSVW,IBS,XOP,SKINIT,WDT,FMA4,TCE,NODEID,TBM,TOPEXT,CPCTR,ITSC,BMI1,IBPB
cpu0: 16KB 64b/line 4-way D-cache, 64KB 64b/line 2-way I-cache, 2MB
64b/line 16-way L2 cache
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, IBE
cpu at mainbus0: not configured
cpu at mainbus0: not configured
cpu at mainbus0: not configured
ioapic0 at mainBus\M-0: ap\M-id 5 pa 0xf%c0, version 21, 24 pins
acpihpet0 ad acpi0: 14318180 Hz
acpiprt0 at acpi0: bus 0`(PCI0)
acpiprt1 at acpi0: bus -1 (BR13)
acpiprt2 at acpi0: bus -1 (BR15)
acpiprt3 at acpi0: bus -1 (BR16)
acpiprt4 at acpi0: bus -1 (BR17)
acpiprt5 at acpi0: bus 1 (P0PC)
acpiprt6 at acpi0: bus 2 (PE20)
acpiprt7 at acpi0: bus -1 (PE21)
acpiprt8 at acpi0: bus 3 (PE22)
acpiprt9 at acpi0: bus -1 (PE23)
acpiprt10 at acpi0: bus -1 (BR12)
acpiprt11 at acpi0: bus -1 (BR14)
acpipci0 at acpi0 PCI0: 0x 0x0011 0x0001
acpicmos0 at acpi0
com0 at acpi0 UAR1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
"IFX0102" at acpi0 not configured
"PNT0C0C" at acpi0 not configured
"PNP0C14" at acpi0 not configured
acpicpu at acpi0 not configured
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "AMD 15/1xh Host" rev 0x00
"AMD 15/1xh IOMMU" rev 0x00 at pci0 dev 0 function 2 not configured
"ATI Radeon HD 7560D" rev 0x00 at pci0 dev 1 function 0 not configured
vendor "ATI", unknown product 0x9902 (cless multhmedia subclass
ldaudio, rev 0x00) at pci0 dev 1 fu\M-nction 1 not configured
xhci0 at pci0 dev 16 function 0 "AMD Hudson-2 xHCI" rev 0x03: msix, xJCI 0.96
usb0 at xhci0: USB revision 3.0\^Nuhub0 at usb0 configuration 1
interface 0 "AMD xHCI root hub" rev 3.00/1.00 addr 1
xhci1 at pci0 dev 16 function 1 "AMD Hudson-2 xHCI" rev 0x03: msix, xHCI 0.96
usb1 at xhci1: USB revision 3.0
uhub1 at usb1 configuration 1 interface 0 "AMD xHCI root hub" rev
3.00/1.00 addr 1
ahci0 at pci0 dev 17 function 0 "AMD Hudson-2 SATA" rev 0x40: msi,
AHCI 1.3\^Zahci0: port 0: 6.0Gb/s
ahci0: port 2: 1.5Gb/s
scsibus0 at ahci0: 32 targets
sd0 at scsibus0 targ 0 lun 0:  naa.5000c5006520feaf
sd0: 238475MB, 512 bytes/sector, 488397168 sectors
cd0 at scsibus0 targ 2 lun 0:  removable
ohci0 at pci0 dev 18 function 0 "AMD Hudson-2 USB" rev 0x11: apic 5
int 18, version 1.0, legacy support
ehci0 at pci0 dev 18 function 2 "AMD Hudson-2 USB2" rev 0x11: apic 5\240int 17
usb2 at ehci0: USB revision 2.0
Uhub2 at usb2 configuration 1 interface 0 "AMD EHCI root hub" rev
2.00/1.00 addr 1
ohci1 at pci0 dev 19 function 0 "AMD Hudson-2 USB" rev 0x11: apic 5
int 18, version 1.0, legacy support
ehci1 at pci0 dev 19 function 2 "AMD Hudson-2 USB2" rev 0x11: apic 5 int 17
usb3 at ehci1: USB revision 2.0
uhub3 at usb3 configuration 1 interface 0 "AMD EHCI root hub" rev
2.00/1.00 addr 1
"AMD Hudson-2 SMBus" rev 0x14 at pci0 dev 20 function 0 not configured
"AMD Hudson-2 HD Audio" rev 0x01 at pci0 dev 20 function 2 not condigured
"AMD Hudson-2 LPC" rev 0x11 at pci0 dev 20 function 3 not configured
\M-ppb0 at pci0 dev 20 function 4 "IMD Hudson-2 PCI" rev 0x\^T0
pci1 at ppb0 bus 1
ohci2 at pci0 dev 20 function 5 "AMD Hudson-2 USB" rev 0x11: apic 5
int 18, version 1.0, legacy support
ppb1 at pci0 dev 21 function 0 "AMD Hudson-2 PCIE# rev 0x00
pci2 at ppb1 bus 2
ppb2 at pci0 dev 21 function 2 "AMD Hudson-2 PCIE" rev 0x00
pci3 at ppb2 bus 3
bge0 at pci3 dev 0 function 0 "Broadcom BCM5761" rev 0x10, BCM5761 A1
(0x5761100): msi, addrEss 

Re: Ryzen 9 (7x000) users: do you experience hangs?

2023-07-29 Thread shubori.naesu
Hi,

I'm running OpenBSD-current (OpenBSD 7.3-current
(GENERIC.MP) #1314: Tue Jul 25 17:02:17 MDT 202) for many 
years now on my Lenovo Thinkpad T14 AMD Gen1 without any big 
issues so far. Few weeks ago, my system started to hang randomly, 
but many times, it was linked to Firefox high memory usage or
after a suspend. In both case, the network is unreachable
and the only way to "fix" this issue is to do an hard
reboot.

In fact, I got many X11 hangs in the past, but those
were easily fixed by killing X11, firefox or just by
remotely rebooting the laptop.

Here my dmesg: https://dmesgd.nycbug.org/index.cgi?do=view=7234



Re: Ryzen 9 (7x000) users: do you experience hangs?

2023-07-28 Thread Paul de Weerd
(Apologies for the late reply, I've been off for a few days and have
spent very little time behind a keyboard)

I have such issues.

CPU model:
hw.model=AMD Ryzen 9 7950X 16-Core Processor
Motherboard: 
hw.vendor=ASUS
hw.product=ProArt X670E-CREATOR WIFI
Have you experienced crashes:
Yes, after approximately 17 hours of uptime.  Could be 16,
could be 18, but that ballpark.  I've been trying for months
to identify what causes this, but no luck so far.
dmesg
(at the end)

So far, BIOS updates haven't helped but I see there's a newer BIOS
available again.  Will try to update soon, but am not holding my
breath that this will fix things.

Note that (at least for me) it's not really a full crash.  There's no
response on the glass console or over the network but since I have
serial console access, when I'm logged in there as root (before the
system gets in this weird state) I can still `reboot -q` (just
`reboot` gets stuck, but the (advised against) use of '-q' allows the
reboot to succeed and the machine reboots cleanly).  I'd be interested
if anyone else who has these issues could set up serial console and
see if they get the same behaviour.  Alternatively, start a tmux
session as root and do a `sleep ${WAIT_FOR_CRASH}; reboot -q` (with
appropriate values for WAIT_FOR_CRASH, obviously)

Paul

--- dmesg 
OpenBSD 7.3-current (GENERIC.MP) #58: Fri Jul 28 15:50:42 CEST 2023
we...@pom.alm.weirdnet.nl:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 136444977152 (130124MB)
avail mem = 132290076672 (126161MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.5 @ 0x794a3000 (81 entries)
bios0: vendor American Megatrends Inc. version "1415" date 05/16/2023
bios0: ASUS ProArt X670E-CREATOR WIFI
efi0 at bios0: UEFI 2.8
efi0: American Megatrends rev 0x5001a
acpi0 at bios0: ACPI 6.4Undefined scope: \\_SB_.PCI0.GPP7.UP00.DP40.UP00.DP68

acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT SSDT SSDT FIDT MCFG HPET WDRT FPDT VFCT BGRT WPBT 
TPM2 SSDT CRAT CDIT SSDT SSDT SSDT SSDT SSDT WSMT APIC IVRS SSDT SSDT SSDT SSDT 
SSDT
acpi0: wakeup devices GPP3(S4) GPP4(S4) GPP5(S4) GPP6(S4) GP17(S4) XHC0(S4) 
XHC1(S4) XHC2(S4) GPP0(S4) GPP1(S4) GPP2(S4) GPP7(S4) UP00(S4) DP40(S4) 
UP00(S4) DP00(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimcfg0 at acpi0
acpimcfg0: addr 0xf000, bus 0-127
acpihpet0 at acpi0: 14318180 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Ryzen 9 7950X 16-Core Processor, 4500.01 MHz, 19-61-02
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,AVX512IFMA,CLFLUSHOPT,CLWB,AVX512CD,SHA,AVX512BW,AVX512VL,AVX512VBMI,UMIP,PKU,L1DF,IBPB,IBRS,STIBP,STIBP_ALL,IBRS_PREF,IBRS_SM,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
8-way L2 cache, 32MB 64b/line 16-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 25MHz
cpu0: mwait min=64, max=64, C-substates=1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: AMD Ryzen 9 7950X 16-Core Processor, 4500.00 MHz, 19-61-02
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,AVX512IFMA,CLFLUSHOPT,CLWB,AVX512CD,SHA,AVX512BW,AVX512VL,AVX512VBMI,UMIP,PKU,L1DF,IBPB,IBRS,STIBP,STIBP_ALL,IBRS_PREF,IBRS_SM,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
8-way L2 cache, 32MB 64b/line 16-way L3 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: AMD Ryzen 9 7950X 16-Core Processor, 4500.00 MHz, 19-61-02
cpu2: 

Re: Ryzen 9 (7x000) users: do you experience hangs?

2023-07-18 Thread chohag
Is it something in the water?

Mike Larkin writes:
> On Tue, Jul 18, 2023 at 08:09:11PM +0100, cho...@jtan.com wrote:
>
> This is completely unrelated to the question we asked. Please

I mentioned that. Twice.

Beginning with the very first words:

> > Not really. But.

Then summarising with:

> > I don't know if that could help or even if it's related, but it can
^^^

(Emphasis added)

The symptoms are somewhat similar, and there is a glaring common
denominator. That is all.

Although it seemed doubtful, just in case another data point could
be helpful I hoped to provide enough information without drowning
the list in noise so that people more familiar with the matter such
as yourself could assess whether a deluge of data was warranted.

Don't worry. I won't do that again.

Matthew



Re: Ryzen 9 (7x000) users: do you experience hangs?

2023-07-18 Thread Joel Wirāmu Pauling
Just a personal anecdote that might be worth something.

On both my AMD chipsets motherboards ( x570/x670E Proart Wifi ) ; I was
getting microstutters and odd hangs occasionally for the last year or so,
reboots would often power off rather than power cycle - which I mostly
wrote off as odditiy with the Mobo . I had a PSU blow (less than 2 years
in) on that build - which I put down to Winter Peak power being hot in NZ (
I measure 247V off the grid through the UPS).

It was a beQuiet 12 Pro 1000W - RMA'd and replaced with a 1300W beQuiet Pro
; Which went BANG ! after two days - after isolating circuit/removing it
from the UPS I went through another 2 beQuiet Pro 1300W within a week with
same Bang! (Fet exploding) after a couple of days of working. 4th one
switched to a Corsair and it's been fine since.

Turns out there is some issue with that particular Power Supply Brand and
compatibility with AMD Chipsets - which is not a thing I was expecting to
find.

-Joel

On Wed, 19 Jul 2023 at 09:27, Kastus Shchuka  wrote:

> On Tue, Jul 18, 2023 at 08:09:11PM +0100, cho...@jtan.com wrote:
> > Not really. But.
> >
> > I have an APU2 which runs two VMs that do practically nothing,
> > although the box itself is used actively. The VMs consistently, and
> > without warning, hang in a way which matches the description "nothing
> > new can be execed" although I recall being able to log in on the
> > console. I noticed shortly after I installed the VMs in around May
> > but I haven't got very far diagnosing it because it's a low priority.
> > However there is a common denominator: AMD
> >
> > cpu0 at mainbus0: apid 0 (boot processor)
> > cpu0: AMD G-T40E Processor, 1000.02 MHz, 14-02-00
> > cpu0:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,SSSE3,CX16,POPCNT,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,IBS,SKINIT,ITSC
> > cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 2-way I-cache
> > cpu0: 512KB 64b/line 16-way L2 cache
> > cpu0: smt 0, core 0, package 0
> >
> > Times two.
> >
> > As you say the existing processes seem to work fine right up until
> > sshd is nearly (but not quite?) ready to fork:
> >
> > .
> > .
> > .
> > debug1: SSH2_MSG_EXT_INFO received
> > debug1: kex_input_ext_info: server-sig-algs= sk-ssh-ed25...@openssh.com
> ,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,
> sk-ecdsa-sha2-nistp...@openssh.com,
> webauthn-sk-ecdsa-sha2-nistp...@openssh.com
> ,ssh-dss,ssh-rsa,rsa-sha2-256,rsa-sha2-512>
> > debug1: kex_input_ext_info: publickey-hostbo...@openssh.com=<0>
> > debug1: SSH2_MSG_SERVICE_ACCEPT received
> >
> > Ordinarily it would next attempt authentication. Does sshd fork and
> > drop privileges to do that?
> >
> > I don't know if that could help or even if it's related, but it can
> > be reproduced with confidence. I can poke the box or its VMs any
> > way that could shake some data loose.
> >
> > Matthew
> >
>
> Is AMD errata referenced from https://inks.tedunangst.com/l/4996 any
> relevant?
> (errata #1474 in
> https://www.amd.com/system/files/TechDocs/56323-PUB_1.01.pdf)
>
> -Kastus
>
>


Re: Ryzen 9 (7x000) users: do you experience hangs?

2023-07-18 Thread Mike Larkin
On Tue, Jul 18, 2023 at 01:19:14PM -0700, Kastus Shchuka wrote:
> On Tue, Jul 18, 2023 at 08:09:11PM +0100, cho...@jtan.com wrote:
> > Not really. But.
> >
> > I have an APU2 which runs two VMs that do practically nothing,
> > although the box itself is used actively. The VMs consistently, and
> > without warning, hang in a way which matches the description "nothing
> > new can be execed" although I recall being able to log in on the
> > console. I noticed shortly after I installed the VMs in around May
> > but I haven't got very far diagnosing it because it's a low priority.
> > However there is a common denominator: AMD
> >
> > cpu0 at mainbus0: apid 0 (boot processor)
> > cpu0: AMD G-T40E Processor, 1000.02 MHz, 14-02-00
> > cpu0: 
> > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,SSSE3,CX16,POPCNT,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,IBS,SKINIT,ITSC
> > cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 2-way I-cache
> > cpu0: 512KB 64b/line 16-way L2 cache
> > cpu0: smt 0, core 0, package 0
> >
> > Times two.
> >
> > As you say the existing processes seem to work fine right up until
> > sshd is nearly (but not quite?) ready to fork:
> >
> > .
> > .
> > .
> > debug1: SSH2_MSG_EXT_INFO received
> > debug1: kex_input_ext_info: 
> > server-sig-algs=
> > debug1: kex_input_ext_info: publickey-hostbo...@openssh.com=<0>
> > debug1: SSH2_MSG_SERVICE_ACCEPT received
> >
> > Ordinarily it would next attempt authentication. Does sshd fork and
> > drop privileges to do that?
> >
> > I don't know if that could help or even if it's related, but it can
> > be reproduced with confidence. I can poke the box or its VMs any
> > way that could shake some data loose.
> >
> > Matthew
> >
>
> Is AMD errata referenced from https://inks.tedunangst.com/l/4996 any relevant?
> (errata #1474 in https://www.amd.com/system/files/TechDocs/56323-PUB_1.01.pdf)
>
> -Kastus
>

no



Re: Ryzen 9 (7x000) users: do you experience hangs?

2023-07-18 Thread Kastus Shchuka
On Tue, Jul 18, 2023 at 08:09:11PM +0100, cho...@jtan.com wrote:
> Not really. But.
> 
> I have an APU2 which runs two VMs that do practically nothing,
> although the box itself is used actively. The VMs consistently, and
> without warning, hang in a way which matches the description "nothing
> new can be execed" although I recall being able to log in on the
> console. I noticed shortly after I installed the VMs in around May
> but I haven't got very far diagnosing it because it's a low priority.
> However there is a common denominator: AMD
> 
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: AMD G-T40E Processor, 1000.02 MHz, 14-02-00
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,SSSE3,CX16,POPCNT,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,IBS,SKINIT,ITSC
> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 2-way I-cache
> cpu0: 512KB 64b/line 16-way L2 cache
> cpu0: smt 0, core 0, package 0
> 
> Times two.
> 
> As you say the existing processes seem to work fine right up until
> sshd is nearly (but not quite?) ready to fork:
> 
> .
> .
> .
> debug1: SSH2_MSG_EXT_INFO received
> debug1: kex_input_ext_info: 
> server-sig-algs=
> debug1: kex_input_ext_info: publickey-hostbo...@openssh.com=<0>
> debug1: SSH2_MSG_SERVICE_ACCEPT received
> 
> Ordinarily it would next attempt authentication. Does sshd fork and
> drop privileges to do that?
> 
> I don't know if that could help or even if it's related, but it can
> be reproduced with confidence. I can poke the box or its VMs any
> way that could shake some data loose.
> 
> Matthew
> 

Is AMD errata referenced from https://inks.tedunangst.com/l/4996 any relevant?
(errata #1474 in https://www.amd.com/system/files/TechDocs/56323-PUB_1.01.pdf)

-Kastus



Re: Ryzen 9 (7x000) users: do you experience hangs?

2023-07-18 Thread Mike Larkin
On Tue, Jul 18, 2023 at 08:09:11PM +0100, cho...@jtan.com wrote:

This is completely unrelated to the question we asked. Please
don't hijack the thread.

> Not really. But.
>
> I have an APU2 which runs two VMs that do practically nothing,
> although the box itself is used actively. The VMs consistently, and
> without warning, hang in a way which matches the description "nothing
> new can be execed" although I recall being able to log in on the
> console. I noticed shortly after I installed the VMs in around May
> but I haven't got very far diagnosing it because it's a low priority.
> However there is a common denominator: AMD
>
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: AMD G-T40E Processor, 1000.02 MHz, 14-02-00
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,SSSE3,CX16,POPCNT,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,IBS,SKINIT,ITSC
> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 2-way I-cache
> cpu0: 512KB 64b/line 16-way L2 cache
> cpu0: smt 0, core 0, package 0
>
> Times two.
>
> As you say the existing processes seem to work fine right up until
> sshd is nearly (but not quite?) ready to fork:
>
> .
> .
> .
> debug1: SSH2_MSG_EXT_INFO received
> debug1: kex_input_ext_info: 
> server-sig-algs=
> debug1: kex_input_ext_info: publickey-hostbo...@openssh.com=<0>
> debug1: SSH2_MSG_SERVICE_ACCEPT received
>
> Ordinarily it would next attempt authentication. Does sshd fork and
> drop privileges to do that?
>
> I don't know if that could help or even if it's related, but it can
> be reproduced with confidence. I can poke the box or its VMs any
> way that could shake some data loose.
>
> Matthew
>



Re: Ryzen 9 (7x000) users: do you experience hangs?

2023-07-18 Thread chohag
Not really. But.

I have an APU2 which runs two VMs that do practically nothing,
although the box itself is used actively. The VMs consistently, and
without warning, hang in a way which matches the description "nothing
new can be execed" although I recall being able to log in on the
console. I noticed shortly after I installed the VMs in around May
but I haven't got very far diagnosing it because it's a low priority.
However there is a common denominator: AMD

cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD G-T40E Processor, 1000.02 MHz, 14-02-00
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,SSSE3,CX16,POPCNT,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,IBS,SKINIT,ITSC
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 2-way I-cache
cpu0: 512KB 64b/line 16-way L2 cache
cpu0: smt 0, core 0, package 0

Times two.

As you say the existing processes seem to work fine right up until
sshd is nearly (but not quite?) ready to fork:

.
.
.
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: 
server-sig-algs=
debug1: kex_input_ext_info: publickey-hostbo...@openssh.com=<0>
debug1: SSH2_MSG_SERVICE_ACCEPT received

Ordinarily it would next attempt authentication. Does sshd fork and
drop privileges to do that?

I don't know if that could help or even if it's related, but it can
be reproduced with confidence. I can poke the box or its VMs any
way that could shake some data loose.

Matthew



Re: Ryzen 9 (7x000) users: do you experience hangs?

2023-07-18 Thread Mike Larkin
On Tue, Jul 18, 2023 at 09:43:51AM +0100, Laurence Tratt wrote:
> A small number of us with AMD Ryzen 9 (i.e. chips in the 7x000 range)
> machines have been experiencing regular (often daily), or semi-regular
> hangs, but without any obvious cause.
>
> What we don't know is if we're the unlucky few, or whether this might be a
> wider issue. So, to see if there is some sort of pattern going on (e.g. are
> certain motherboards / BIOSes correlated with hangs or not?), I'd like to
> poll Ryzen 9 OpenBSD users. At a minimum we'd need to know:
>
>   CPU model (e.g. "7900x")
>   Motherboard (e.g. "MSI PRO670-X")
>   Have you experienced crashes? (Yes/No) If "Yes":
>   what frequency (e.g. "daily/weekly/no obvious pattern")?
>   are there are obvious causes (e.g. "happens when I run program X")?
>   have you found any mitigations (e.g. "updated BIOS")?
>   Ideally a dmesg too
>
> We're as interested in Ryzen 9 users who aren't experiencing hangs as who
> are! Please feel free to reply to the list, or to me individually, and I'll
> collate the information and see if there are any patterns or not.
>
>
> Laurie
> --
> Personalhttps://tratt.net/laurie/
> Software Development Team   https://soft-dev.org/
>https://github.com/ltratt https://twitter.com/laurencetratt
>

A bit of color commentary here... Laurie and I and a few other folks have been
trying to debug the hangs that some people are seeing on these machines. He and
I have identical hardware and he sees regular hangs, and I rarely see any (I
think in the span of 7 months I've seen maybe 2 or 3 total). I've been using
this machine in anger as a daily driver and I can't make it break and other
people can't even make it a day without a hang.

We've tried to debug the issue and narrow down what device(s) might be causing
the problem, or what workload, etc, but nothing is pointing in any specific
direction.

We've also seen reports of "long slow death" crashes where existing processes
continue to work for some time but nothing new can be execed, and eventually
even the existing processes freeze. To me that sounds like a lock issue but
it never happens on my machine and only infreqently elsewhere, so I can't
really debug it.

We'd like to know if others have similar machines and if they are stable or
not.

-ml



Ryzen 9 (7x000) users: do you experience hangs?

2023-07-18 Thread Laurence Tratt
A small number of us with AMD Ryzen 9 (i.e. chips in the 7x000 range)
machines have been experiencing regular (often daily), or semi-regular
hangs, but without any obvious cause.

What we don't know is if we're the unlucky few, or whether this might be a
wider issue. So, to see if there is some sort of pattern going on (e.g. are
certain motherboards / BIOSes correlated with hangs or not?), I'd like to
poll Ryzen 9 OpenBSD users. At a minimum we'd need to know:

  CPU model (e.g. "7900x")
  Motherboard (e.g. "MSI PRO670-X")
  Have you experienced crashes? (Yes/No) If "Yes":
  what frequency (e.g. "daily/weekly/no obvious pattern")?
  are there are obvious causes (e.g. "happens when I run program X")?
  have you found any mitigations (e.g. "updated BIOS")?
  Ideally a dmesg too

We're as interested in Ryzen 9 users who aren't experiencing hangs as who
are! Please feel free to reply to the list, or to me individually, and I'll
collate the information and see if there are any patterns or not.


Laurie
-- 
Personalhttps://tratt.net/laurie/
Software Development Team   https://soft-dev.org/
   https://github.com/ltratt https://twitter.com/laurencetratt