Re: Ryzen 9 (7x000) users: do you experience hangs?
On 2023-08-03, Mihai Popescu wrote: > Not a Ryzen 9, but an AMD. you seem to be on the wrong thread. this one is *specifically* for ryzen 9, here is the original message; | A small number of us with AMD Ryzen 9 (i.e. chips in the 7x000 range) | machines have been experiencing regular (often daily), or semi-regular | hangs, but without any obvious cause. | | What we don't know is if we're the unlucky few, or whether this might be a | wider issue. So, to see if there is some sort of pattern going on (e.g. are | certain motherboards / BIOSes correlated with hangs or not?), I'd like to | poll Ryzen 9 OpenBSD users. At a minimum we'd need to know: | | CPU model (e.g. "7900x") | Motherboard (e.g. "MSI PRO670-X") | Have you experienced crashes? (Yes/No) If "Yes": | what frequency (e.g. "daily/weekly/no obvious pattern")? | are there are obvious causes (e.g. "happens when I run program X")? | have you found any mitigations (e.g. "updated BIOS")? | Ideally a dmesg too | | We're as interested in Ryzen 9 users who aren't experiencing hangs as who | are! Please feel free to reply to the list, or to me individually, and I'll | collate the information and see if there are any patterns or not. anything else is noise on this thread, please report it separately.
Re: Ryzen 9 (7x000) users: do you experience hangs?
I got the system freeze from the previous email again. I don't know what "debug" commands to run, please point me in the right direction. Thank you.
Re: Ryzen 9 (7x000) users: do you experience hangs?
Not a Ryzen 9, but an AMD. I installed the snapshot, and started playing endless-sky from packages. Went away and we i came back, the game was not responsive anymore. Switched to console with Alt+Ctrl+F2, logged in as an user and used top command. endless-sky was listed as sleep / kqread. Logged out, tried to switch back to X, all went dark and i was not able to get any console so i did a reboot. It is strange since this game was running rock solid in previous 7.x snapshots. What should i run next time, to get some good data? dmesg output: ( some \^x strange characters removed ) ( note the devaadt name - not edited by me ) OpenBSD 7.3-current (RAMDISK_CD) #1251: Wed Aug 2 10:32:58 MDT 2023 deva...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD real mem = 7711522816 (7354MB) avail mem = 7473790976 (7127MB) random: good seed from bootblocks mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.7 @ 0pe86df (v4 entrims) bios0: vendor Hewlett-Packard version "K06 v02.77" date 03/22/2018 bios0: Heulett-Packard HP Compaq Pro 6305 SFF acpi0 at bios0: ACPI 5.0 acpi0: tables DSDT FACP APIC FPDT MCFG HPET SSDT SLIC MSDM TCPA IVRS VFCT SSDT SSDT CRAT acpimadt0 at acpk0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 16 (boot processor) cpu0: AMD A8-5500B APU with Radeon(tm) HD Graphics, 3194.38 MHz, 15-10-01 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG\^LSVM,EAPICSP,AMCR8,ABM,SSE4A,MAS\^SE,3DNOWP,OSVW,IBS,XOP,SKINIT,WDT,FMA4,TCE,NODEID,TBM,TOPEXT,CPCTR,ITSC,BMI1,IBPB cpu0: 16KB 64b/line 4-way D-cache, 64KB 64b/line 2-way I-cache, 2MB 64b/line 16-way L2 cache cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, IBE cpu at mainbus0: not configured cpu at mainbus0: not configured cpu at mainbus0: not configured ioapic0 at mainBus\M-0: ap\M-id 5 pa 0xf%c0, version 21, 24 pins acpihpet0 ad acpi0: 14318180 Hz acpiprt0 at acpi0: bus 0`(PCI0) acpiprt1 at acpi0: bus -1 (BR13) acpiprt2 at acpi0: bus -1 (BR15) acpiprt3 at acpi0: bus -1 (BR16) acpiprt4 at acpi0: bus -1 (BR17) acpiprt5 at acpi0: bus 1 (P0PC) acpiprt6 at acpi0: bus 2 (PE20) acpiprt7 at acpi0: bus -1 (PE21) acpiprt8 at acpi0: bus 3 (PE22) acpiprt9 at acpi0: bus -1 (PE23) acpiprt10 at acpi0: bus -1 (BR12) acpiprt11 at acpi0: bus -1 (BR14) acpipci0 at acpi0 PCI0: 0x 0x0011 0x0001 acpicmos0 at acpi0 com0 at acpi0 UAR1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo "IFX0102" at acpi0 not configured "PNT0C0C" at acpi0 not configured "PNP0C14" at acpi0 not configured acpicpu at acpi0 not configured pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "AMD 15/1xh Host" rev 0x00 "AMD 15/1xh IOMMU" rev 0x00 at pci0 dev 0 function 2 not configured "ATI Radeon HD 7560D" rev 0x00 at pci0 dev 1 function 0 not configured vendor "ATI", unknown product 0x9902 (cless multhmedia subclass ldaudio, rev 0x00) at pci0 dev 1 fu\M-nction 1 not configured xhci0 at pci0 dev 16 function 0 "AMD Hudson-2 xHCI" rev 0x03: msix, xJCI 0.96 usb0 at xhci0: USB revision 3.0\^Nuhub0 at usb0 configuration 1 interface 0 "AMD xHCI root hub" rev 3.00/1.00 addr 1 xhci1 at pci0 dev 16 function 1 "AMD Hudson-2 xHCI" rev 0x03: msix, xHCI 0.96 usb1 at xhci1: USB revision 3.0 uhub1 at usb1 configuration 1 interface 0 "AMD xHCI root hub" rev 3.00/1.00 addr 1 ahci0 at pci0 dev 17 function 0 "AMD Hudson-2 SATA" rev 0x40: msi, AHCI 1.3\^Zahci0: port 0: 6.0Gb/s ahci0: port 2: 1.5Gb/s scsibus0 at ahci0: 32 targets sd0 at scsibus0 targ 0 lun 0: naa.5000c5006520feaf sd0: 238475MB, 512 bytes/sector, 488397168 sectors cd0 at scsibus0 targ 2 lun 0: removable ohci0 at pci0 dev 18 function 0 "AMD Hudson-2 USB" rev 0x11: apic 5 int 18, version 1.0, legacy support ehci0 at pci0 dev 18 function 2 "AMD Hudson-2 USB2" rev 0x11: apic 5\240int 17 usb2 at ehci0: USB revision 2.0 Uhub2 at usb2 configuration 1 interface 0 "AMD EHCI root hub" rev 2.00/1.00 addr 1 ohci1 at pci0 dev 19 function 0 "AMD Hudson-2 USB" rev 0x11: apic 5 int 18, version 1.0, legacy support ehci1 at pci0 dev 19 function 2 "AMD Hudson-2 USB2" rev 0x11: apic 5 int 17 usb3 at ehci1: USB revision 2.0 uhub3 at usb3 configuration 1 interface 0 "AMD EHCI root hub" rev 2.00/1.00 addr 1 "AMD Hudson-2 SMBus" rev 0x14 at pci0 dev 20 function 0 not configured "AMD Hudson-2 HD Audio" rev 0x01 at pci0 dev 20 function 2 not condigured "AMD Hudson-2 LPC" rev 0x11 at pci0 dev 20 function 3 not configured \M-ppb0 at pci0 dev 20 function 4 "IMD Hudson-2 PCI" rev 0x\^T0 pci1 at ppb0 bus 1 ohci2 at pci0 dev 20 function 5 "AMD Hudson-2 USB" rev 0x11: apic 5 int 18, version 1.0, legacy support ppb1 at pci0 dev 21 function 0 "AMD Hudson-2 PCIE# rev 0x00 pci2 at ppb1 bus 2 ppb2 at pci0 dev 21 function 2 "AMD Hudson-2 PCIE" rev 0x00 pci3 at ppb2 bus 3 bge0 at pci3 dev 0 function 0 "Broadcom BCM5761" rev 0x10, BCM5761 A1 (0x5761100): msi, addrEss
Re: Ryzen 9 (7x000) users: do you experience hangs?
Hi, I'm running OpenBSD-current (OpenBSD 7.3-current (GENERIC.MP) #1314: Tue Jul 25 17:02:17 MDT 202) for many years now on my Lenovo Thinkpad T14 AMD Gen1 without any big issues so far. Few weeks ago, my system started to hang randomly, but many times, it was linked to Firefox high memory usage or after a suspend. In both case, the network is unreachable and the only way to "fix" this issue is to do an hard reboot. In fact, I got many X11 hangs in the past, but those were easily fixed by killing X11, firefox or just by remotely rebooting the laptop. Here my dmesg: https://dmesgd.nycbug.org/index.cgi?do=view=7234
Re: Ryzen 9 (7x000) users: do you experience hangs?
(Apologies for the late reply, I've been off for a few days and have spent very little time behind a keyboard) I have such issues. CPU model: hw.model=AMD Ryzen 9 7950X 16-Core Processor Motherboard: hw.vendor=ASUS hw.product=ProArt X670E-CREATOR WIFI Have you experienced crashes: Yes, after approximately 17 hours of uptime. Could be 16, could be 18, but that ballpark. I've been trying for months to identify what causes this, but no luck so far. dmesg (at the end) So far, BIOS updates haven't helped but I see there's a newer BIOS available again. Will try to update soon, but am not holding my breath that this will fix things. Note that (at least for me) it's not really a full crash. There's no response on the glass console or over the network but since I have serial console access, when I'm logged in there as root (before the system gets in this weird state) I can still `reboot -q` (just `reboot` gets stuck, but the (advised against) use of '-q' allows the reboot to succeed and the machine reboots cleanly). I'd be interested if anyone else who has these issues could set up serial console and see if they get the same behaviour. Alternatively, start a tmux session as root and do a `sleep ${WAIT_FOR_CRASH}; reboot -q` (with appropriate values for WAIT_FOR_CRASH, obviously) Paul --- dmesg OpenBSD 7.3-current (GENERIC.MP) #58: Fri Jul 28 15:50:42 CEST 2023 we...@pom.alm.weirdnet.nl:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 136444977152 (130124MB) avail mem = 132290076672 (126161MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 3.5 @ 0x794a3000 (81 entries) bios0: vendor American Megatrends Inc. version "1415" date 05/16/2023 bios0: ASUS ProArt X670E-CREATOR WIFI efi0 at bios0: UEFI 2.8 efi0: American Megatrends rev 0x5001a acpi0 at bios0: ACPI 6.4Undefined scope: \\_SB_.PCI0.GPP7.UP00.DP40.UP00.DP68 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP SSDT SSDT SSDT FIDT MCFG HPET WDRT FPDT VFCT BGRT WPBT TPM2 SSDT CRAT CDIT SSDT SSDT SSDT SSDT SSDT WSMT APIC IVRS SSDT SSDT SSDT SSDT SSDT acpi0: wakeup devices GPP3(S4) GPP4(S4) GPP5(S4) GPP6(S4) GP17(S4) XHC0(S4) XHC1(S4) XHC2(S4) GPP0(S4) GPP1(S4) GPP2(S4) GPP7(S4) UP00(S4) DP40(S4) UP00(S4) DP00(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 32 bits acpimcfg0 at acpi0 acpimcfg0: addr 0xf000, bus 0-127 acpihpet0 at acpi0: 14318180 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: AMD Ryzen 9 7950X 16-Core Processor, 4500.01 MHz, 19-61-02 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,AVX512IFMA,CLFLUSHOPT,CLWB,AVX512CD,SHA,AVX512BW,AVX512VL,AVX512VBMI,UMIP,PKU,L1DF,IBPB,IBRS,STIBP,STIBP_ALL,IBRS_PREF,IBRS_SM,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 8-way L2 cache, 32MB 64b/line 16-way L3 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 25MHz cpu0: mwait min=64, max=64, C-substates=1.1, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: AMD Ryzen 9 7950X 16-Core Processor, 4500.00 MHz, 19-61-02 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,AVX512IFMA,CLFLUSHOPT,CLWB,AVX512CD,SHA,AVX512BW,AVX512VL,AVX512VBMI,UMIP,PKU,L1DF,IBPB,IBRS,STIBP,STIBP_ALL,IBRS_PREF,IBRS_SM,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 8-way L2 cache, 32MB 64b/line 16-way L3 cache cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 4 (application processor) cpu2: AMD Ryzen 9 7950X 16-Core Processor, 4500.00 MHz, 19-61-02 cpu2:
Re: Ryzen 9 (7x000) users: do you experience hangs?
Is it something in the water? Mike Larkin writes: > On Tue, Jul 18, 2023 at 08:09:11PM +0100, cho...@jtan.com wrote: > > This is completely unrelated to the question we asked. Please I mentioned that. Twice. Beginning with the very first words: > > Not really. But. Then summarising with: > > I don't know if that could help or even if it's related, but it can ^^^ (Emphasis added) The symptoms are somewhat similar, and there is a glaring common denominator. That is all. Although it seemed doubtful, just in case another data point could be helpful I hoped to provide enough information without drowning the list in noise so that people more familiar with the matter such as yourself could assess whether a deluge of data was warranted. Don't worry. I won't do that again. Matthew
Re: Ryzen 9 (7x000) users: do you experience hangs?
Just a personal anecdote that might be worth something. On both my AMD chipsets motherboards ( x570/x670E Proart Wifi ) ; I was getting microstutters and odd hangs occasionally for the last year or so, reboots would often power off rather than power cycle - which I mostly wrote off as odditiy with the Mobo . I had a PSU blow (less than 2 years in) on that build - which I put down to Winter Peak power being hot in NZ ( I measure 247V off the grid through the UPS). It was a beQuiet 12 Pro 1000W - RMA'd and replaced with a 1300W beQuiet Pro ; Which went BANG ! after two days - after isolating circuit/removing it from the UPS I went through another 2 beQuiet Pro 1300W within a week with same Bang! (Fet exploding) after a couple of days of working. 4th one switched to a Corsair and it's been fine since. Turns out there is some issue with that particular Power Supply Brand and compatibility with AMD Chipsets - which is not a thing I was expecting to find. -Joel On Wed, 19 Jul 2023 at 09:27, Kastus Shchuka wrote: > On Tue, Jul 18, 2023 at 08:09:11PM +0100, cho...@jtan.com wrote: > > Not really. But. > > > > I have an APU2 which runs two VMs that do practically nothing, > > although the box itself is used actively. The VMs consistently, and > > without warning, hang in a way which matches the description "nothing > > new can be execed" although I recall being able to log in on the > > console. I noticed shortly after I installed the VMs in around May > > but I haven't got very far diagnosing it because it's a low priority. > > However there is a common denominator: AMD > > > > cpu0 at mainbus0: apid 0 (boot processor) > > cpu0: AMD G-T40E Processor, 1000.02 MHz, 14-02-00 > > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,SSSE3,CX16,POPCNT,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,IBS,SKINIT,ITSC > > cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 2-way I-cache > > cpu0: 512KB 64b/line 16-way L2 cache > > cpu0: smt 0, core 0, package 0 > > > > Times two. > > > > As you say the existing processes seem to work fine right up until > > sshd is nearly (but not quite?) ready to fork: > > > > . > > . > > . > > debug1: SSH2_MSG_EXT_INFO received > > debug1: kex_input_ext_info: server-sig-algs= sk-ssh-ed25...@openssh.com > ,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521, > sk-ecdsa-sha2-nistp...@openssh.com, > webauthn-sk-ecdsa-sha2-nistp...@openssh.com > ,ssh-dss,ssh-rsa,rsa-sha2-256,rsa-sha2-512> > > debug1: kex_input_ext_info: publickey-hostbo...@openssh.com=<0> > > debug1: SSH2_MSG_SERVICE_ACCEPT received > > > > Ordinarily it would next attempt authentication. Does sshd fork and > > drop privileges to do that? > > > > I don't know if that could help or even if it's related, but it can > > be reproduced with confidence. I can poke the box or its VMs any > > way that could shake some data loose. > > > > Matthew > > > > Is AMD errata referenced from https://inks.tedunangst.com/l/4996 any > relevant? > (errata #1474 in > https://www.amd.com/system/files/TechDocs/56323-PUB_1.01.pdf) > > -Kastus > >
Re: Ryzen 9 (7x000) users: do you experience hangs?
On Tue, Jul 18, 2023 at 01:19:14PM -0700, Kastus Shchuka wrote: > On Tue, Jul 18, 2023 at 08:09:11PM +0100, cho...@jtan.com wrote: > > Not really. But. > > > > I have an APU2 which runs two VMs that do practically nothing, > > although the box itself is used actively. The VMs consistently, and > > without warning, hang in a way which matches the description "nothing > > new can be execed" although I recall being able to log in on the > > console. I noticed shortly after I installed the VMs in around May > > but I haven't got very far diagnosing it because it's a low priority. > > However there is a common denominator: AMD > > > > cpu0 at mainbus0: apid 0 (boot processor) > > cpu0: AMD G-T40E Processor, 1000.02 MHz, 14-02-00 > > cpu0: > > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,SSSE3,CX16,POPCNT,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,IBS,SKINIT,ITSC > > cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 2-way I-cache > > cpu0: 512KB 64b/line 16-way L2 cache > > cpu0: smt 0, core 0, package 0 > > > > Times two. > > > > As you say the existing processes seem to work fine right up until > > sshd is nearly (but not quite?) ready to fork: > > > > . > > . > > . > > debug1: SSH2_MSG_EXT_INFO received > > debug1: kex_input_ext_info: > > server-sig-algs= > > debug1: kex_input_ext_info: publickey-hostbo...@openssh.com=<0> > > debug1: SSH2_MSG_SERVICE_ACCEPT received > > > > Ordinarily it would next attempt authentication. Does sshd fork and > > drop privileges to do that? > > > > I don't know if that could help or even if it's related, but it can > > be reproduced with confidence. I can poke the box or its VMs any > > way that could shake some data loose. > > > > Matthew > > > > Is AMD errata referenced from https://inks.tedunangst.com/l/4996 any relevant? > (errata #1474 in https://www.amd.com/system/files/TechDocs/56323-PUB_1.01.pdf) > > -Kastus > no
Re: Ryzen 9 (7x000) users: do you experience hangs?
On Tue, Jul 18, 2023 at 08:09:11PM +0100, cho...@jtan.com wrote: > Not really. But. > > I have an APU2 which runs two VMs that do practically nothing, > although the box itself is used actively. The VMs consistently, and > without warning, hang in a way which matches the description "nothing > new can be execed" although I recall being able to log in on the > console. I noticed shortly after I installed the VMs in around May > but I haven't got very far diagnosing it because it's a low priority. > However there is a common denominator: AMD > > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: AMD G-T40E Processor, 1000.02 MHz, 14-02-00 > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,SSSE3,CX16,POPCNT,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,IBS,SKINIT,ITSC > cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 2-way I-cache > cpu0: 512KB 64b/line 16-way L2 cache > cpu0: smt 0, core 0, package 0 > > Times two. > > As you say the existing processes seem to work fine right up until > sshd is nearly (but not quite?) ready to fork: > > . > . > . > debug1: SSH2_MSG_EXT_INFO received > debug1: kex_input_ext_info: > server-sig-algs= > debug1: kex_input_ext_info: publickey-hostbo...@openssh.com=<0> > debug1: SSH2_MSG_SERVICE_ACCEPT received > > Ordinarily it would next attempt authentication. Does sshd fork and > drop privileges to do that? > > I don't know if that could help or even if it's related, but it can > be reproduced with confidence. I can poke the box or its VMs any > way that could shake some data loose. > > Matthew > Is AMD errata referenced from https://inks.tedunangst.com/l/4996 any relevant? (errata #1474 in https://www.amd.com/system/files/TechDocs/56323-PUB_1.01.pdf) -Kastus
Re: Ryzen 9 (7x000) users: do you experience hangs?
On Tue, Jul 18, 2023 at 08:09:11PM +0100, cho...@jtan.com wrote: This is completely unrelated to the question we asked. Please don't hijack the thread. > Not really. But. > > I have an APU2 which runs two VMs that do practically nothing, > although the box itself is used actively. The VMs consistently, and > without warning, hang in a way which matches the description "nothing > new can be execed" although I recall being able to log in on the > console. I noticed shortly after I installed the VMs in around May > but I haven't got very far diagnosing it because it's a low priority. > However there is a common denominator: AMD > > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: AMD G-T40E Processor, 1000.02 MHz, 14-02-00 > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,SSSE3,CX16,POPCNT,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,IBS,SKINIT,ITSC > cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 2-way I-cache > cpu0: 512KB 64b/line 16-way L2 cache > cpu0: smt 0, core 0, package 0 > > Times two. > > As you say the existing processes seem to work fine right up until > sshd is nearly (but not quite?) ready to fork: > > . > . > . > debug1: SSH2_MSG_EXT_INFO received > debug1: kex_input_ext_info: > server-sig-algs= > debug1: kex_input_ext_info: publickey-hostbo...@openssh.com=<0> > debug1: SSH2_MSG_SERVICE_ACCEPT received > > Ordinarily it would next attempt authentication. Does sshd fork and > drop privileges to do that? > > I don't know if that could help or even if it's related, but it can > be reproduced with confidence. I can poke the box or its VMs any > way that could shake some data loose. > > Matthew >
Re: Ryzen 9 (7x000) users: do you experience hangs?
Not really. But. I have an APU2 which runs two VMs that do practically nothing, although the box itself is used actively. The VMs consistently, and without warning, hang in a way which matches the description "nothing new can be execed" although I recall being able to log in on the console. I noticed shortly after I installed the VMs in around May but I haven't got very far diagnosing it because it's a low priority. However there is a common denominator: AMD cpu0 at mainbus0: apid 0 (boot processor) cpu0: AMD G-T40E Processor, 1000.02 MHz, 14-02-00 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,SSSE3,CX16,POPCNT,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,IBS,SKINIT,ITSC cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 2-way I-cache cpu0: 512KB 64b/line 16-way L2 cache cpu0: smt 0, core 0, package 0 Times two. As you say the existing processes seem to work fine right up until sshd is nearly (but not quite?) ready to fork: . . . debug1: SSH2_MSG_EXT_INFO received debug1: kex_input_ext_info: server-sig-algs= debug1: kex_input_ext_info: publickey-hostbo...@openssh.com=<0> debug1: SSH2_MSG_SERVICE_ACCEPT received Ordinarily it would next attempt authentication. Does sshd fork and drop privileges to do that? I don't know if that could help or even if it's related, but it can be reproduced with confidence. I can poke the box or its VMs any way that could shake some data loose. Matthew
Re: Ryzen 9 (7x000) users: do you experience hangs?
On Tue, Jul 18, 2023 at 09:43:51AM +0100, Laurence Tratt wrote: > A small number of us with AMD Ryzen 9 (i.e. chips in the 7x000 range) > machines have been experiencing regular (often daily), or semi-regular > hangs, but without any obvious cause. > > What we don't know is if we're the unlucky few, or whether this might be a > wider issue. So, to see if there is some sort of pattern going on (e.g. are > certain motherboards / BIOSes correlated with hangs or not?), I'd like to > poll Ryzen 9 OpenBSD users. At a minimum we'd need to know: > > CPU model (e.g. "7900x") > Motherboard (e.g. "MSI PRO670-X") > Have you experienced crashes? (Yes/No) If "Yes": > what frequency (e.g. "daily/weekly/no obvious pattern")? > are there are obvious causes (e.g. "happens when I run program X")? > have you found any mitigations (e.g. "updated BIOS")? > Ideally a dmesg too > > We're as interested in Ryzen 9 users who aren't experiencing hangs as who > are! Please feel free to reply to the list, or to me individually, and I'll > collate the information and see if there are any patterns or not. > > > Laurie > -- > Personalhttps://tratt.net/laurie/ > Software Development Team https://soft-dev.org/ >https://github.com/ltratt https://twitter.com/laurencetratt > A bit of color commentary here... Laurie and I and a few other folks have been trying to debug the hangs that some people are seeing on these machines. He and I have identical hardware and he sees regular hangs, and I rarely see any (I think in the span of 7 months I've seen maybe 2 or 3 total). I've been using this machine in anger as a daily driver and I can't make it break and other people can't even make it a day without a hang. We've tried to debug the issue and narrow down what device(s) might be causing the problem, or what workload, etc, but nothing is pointing in any specific direction. We've also seen reports of "long slow death" crashes where existing processes continue to work for some time but nothing new can be execed, and eventually even the existing processes freeze. To me that sounds like a lock issue but it never happens on my machine and only infreqently elsewhere, so I can't really debug it. We'd like to know if others have similar machines and if they are stable or not. -ml
Ryzen 9 (7x000) users: do you experience hangs?
A small number of us with AMD Ryzen 9 (i.e. chips in the 7x000 range) machines have been experiencing regular (often daily), or semi-regular hangs, but without any obvious cause. What we don't know is if we're the unlucky few, or whether this might be a wider issue. So, to see if there is some sort of pattern going on (e.g. are certain motherboards / BIOSes correlated with hangs or not?), I'd like to poll Ryzen 9 OpenBSD users. At a minimum we'd need to know: CPU model (e.g. "7900x") Motherboard (e.g. "MSI PRO670-X") Have you experienced crashes? (Yes/No) If "Yes": what frequency (e.g. "daily/weekly/no obvious pattern")? are there are obvious causes (e.g. "happens when I run program X")? have you found any mitigations (e.g. "updated BIOS")? Ideally a dmesg too We're as interested in Ryzen 9 users who aren't experiencing hangs as who are! Please feel free to reply to the list, or to me individually, and I'll collate the information and see if there are any patterns or not. Laurie -- Personalhttps://tratt.net/laurie/ Software Development Team https://soft-dev.org/ https://github.com/ltratt https://twitter.com/laurencetratt