Re: Catastrophic machine freezes - X related
You’re using tmux with or without X? We’re getting different errors. Thus far my errors are definitely X related. Coincidentally I was just working on this. My machine crashed, and my logs are showing: rwsleep_nsec: Xorg[98908]: fsleep: trying to sleep zero nanoseconds I’m looking into it as we speak. On Sun, Mar 8, 2020 at 4:09 PM Avon Robertson wrote: > On Sat, Feb 29, 2020 at 07:41:59AM -0800, Justin Noor wrote: > > Awesome - thank you for your time and for the valuable information. > > > > That’s hilarious about the serial port. I’ll try plugging into a switch, > > reproducing the crash, and SSHing into it. I still haven’t tried the > > syslogd tip you mentioned either. It’s time for me to start learning more > > about X. Will be in touch. > > > > Regards > > > > On Fri, Feb 28, 2020 at 6:57 AM Stuart Longland < > stua...@longlandclan.id.au> > > wrote: > > > > > On 28/2/20 11:32 pm, Justin Noor wrote: > > > > Thanks for offering to help and sorry for the delay - I got dragged > into > > > a > > > > work emergency. I finally managed to SCP my dmesg to a remote > machine. > > > > > > Heh, no problems, these things happen. > > > > > > > As a refresher I have a 6.6 current machine that crashes when X is > > > running, > > > > and almost instantly when Firefox is running - it runs fine without > X. > > > The > > > > machine becomes totally frozen - I have to perform a forced shutdown > to > > > > exit this state. The issue appears to be graphics related and is > > > > inconsistent - sometimes it crashes immediately, other times it does > not. > > > > > > Sometimes it might be the way a particular graphics toolkit "tickles" > > > the video hardware too. For instance FVWM uses libxcb for drawing > > > graphics which means you're likely to be just working with 2D > primitives. > > > > > > Then Firefox with its GTK+ back-end fires off a few RENDER extension > > > requests to the X server and whoopsie! Down she goes! > > > > > > > There are indeed some "unknown product" messages related to my PCI > > > graphics > > > > card in my dmesg, but I haven't been able to decipher them yet. Those > > > > usually mean the device is not supported, but it is, and I'm sure I > have > > > > the correct driver (amdgpu0). Previously I had no issues for months, > > > which > > > > is why I suspected hardware failure. Admittedly I've been lucky with > > > > graphics cards over the years, and don't know much about PCI. > > > > > > No issues for months running a previous version of OpenBSD or the same > > > you're running now? > > > > > > One suggestion I made too was to maybe try setting up a serial console > > > link… turns out the motherboard makers know how to tease: > > > > > > > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo > > > > com0: probed fifo depth: 0 bytes > > > > > > That says there is a RS-232 port somewhere… so I had a look at the > > > handbook: > > > > > > > https://dlcdnets.asus.com/pub/ASUS/mb/SocketAM4/ROG_STRIX_B450-I_GAMING/E14337_ROG_STRIX_B450-I_GAMING_UM_PRINT.pdf > > > > > > They didn't wire it up to a pin header, which is annoying. > > > > > > On the video front, I did see this: > > > > initializing kernel modesetting (POLARIS11 0x1002:0x67EF > 0x1002:0x0B04 > > > > 0xE5). > > > > amdgpu_irq_add_domain: stub > > > > amdgpu_device_resize_fb_bar: stub > > > > amdgpu: [powerplay] Failed to retrieve minimum clocks. > > > > amdgpu0: 1360x768, 32bpp > > > > wsdisplay0 at amdgpu0 mux 1: console (std, vt100 emulation), using > wskbd0 > > > > wskbd1: connecting to wsdisplay0 > > > > wsdisplay0: screen 1-5 added (std, vt100 emulation) > > > > > > The "stub" messages make me wonder if we're hitting some > > > not-yet-implemented features. That "failed to retrieve minimum clocks" > > > has been seen on Linux as well, and there it was related to PCI > prefetch > > > register programming. > > > > > > The machine you've got isn't much different to what I have at work > > > actually: Rysen 7 1700 (so previous generation), and a RX550 video card > > > (POLARIS12, maybe slightly newer?)… the machine is fitted with a RS-232 > > > serial port so I might try a little experiment with a USB stick and see > > > if I can install OpenBSD 6.6 to USB storage and try to reproduce the > crash. > > > -- > > > Stuart Longland (aka Redhatter, VK4MSL) > > > > > > I haven't lost my mind... > > > ...it's backed up on a tape somewhere. > > > > > Hello Justin and Stuart, > > It is possible that the errors that I have found in /var/log/messages* > are unrelated to the above. Thoughts? > > I have noticed that the freezes on this machine occur more quickly if I > am working within tmux(1), as I was; at the time that the last freeze > occurred. That may have been sheer coincidence. > > $ grep ERROR /var/log/messag* > /var/log/messages:Mar 8 16:20:10 gx470 /bsd: [drm] *ERROR* ring gfx > timeout, signaled seq=385, emitted seq=387 > /var/log/messages:Mar 9 07:06:34 gx470 /bsd: [drm] *ERROR* Illegal > register access in comma
Re: Catastrophic
On Sat, Feb 29, 2020 at 07:41:59AM -0800, Justin Noor wrote: > Awesome - thank you for your time and for the valuable information. > > That’s hilarious about the serial port. I’ll try plugging into a switch, > reproducing the crash, and SSHing into it. I still haven’t tried the > syslogd tip you mentioned either. It’s time for me to start learning more > about X. Will be in touch. > > Regards > > On Fri, Feb 28, 2020 at 6:57 AM Stuart Longland > wrote: > > > On 28/2/20 11:32 pm, Justin Noor wrote: > > > Thanks for offering to help and sorry for the delay - I got dragged into > > a > > > work emergency. I finally managed to SCP my dmesg to a remote machine. > > > > Heh, no problems, these things happen. > > > > > As a refresher I have a 6.6 current machine that crashes when X is > > running, > > > and almost instantly when Firefox is running - it runs fine without X. > > The > > > machine becomes totally frozen - I have to perform a forced shutdown to > > > exit this state. The issue appears to be graphics related and is > > > inconsistent - sometimes it crashes immediately, other times it does not. > > > > Sometimes it might be the way a particular graphics toolkit "tickles" > > the video hardware too. For instance FVWM uses libxcb for drawing > > graphics which means you're likely to be just working with 2D primitives. > > > > Then Firefox with its GTK+ back-end fires off a few RENDER extension > > requests to the X server and whoopsie! Down she goes! > > > > > There are indeed some "unknown product" messages related to my PCI > > graphics > > > card in my dmesg, but I haven't been able to decipher them yet. Those > > > usually mean the device is not supported, but it is, and I'm sure I have > > > the correct driver (amdgpu0). Previously I had no issues for months, > > which > > > is why I suspected hardware failure. Admittedly I've been lucky with > > > graphics cards over the years, and don't know much about PCI. > > > > No issues for months running a previous version of OpenBSD or the same > > you're running now? > > > > One suggestion I made too was to maybe try setting up a serial console > > link… turns out the motherboard makers know how to tease: > > > > > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo > > > com0: probed fifo depth: 0 bytes > > > > That says there is a RS-232 port somewhere… so I had a look at the > > handbook: > > > > https://dlcdnets.asus.com/pub/ASUS/mb/SocketAM4/ROG_STRIX_B450-I_GAMING/E14337_ROG_STRIX_B450-I_GAMING_UM_PRINT.pdf > > > > They didn't wire it up to a pin header, which is annoying. > > > > On the video front, I did see this: > > > initializing kernel modesetting (POLARIS11 0x1002:0x67EF 0x1002:0x0B04 > > > 0xE5). > > > amdgpu_irq_add_domain: stub > > > amdgpu_device_resize_fb_bar: stub > > > amdgpu: [powerplay] Failed to retrieve minimum clocks. > > > amdgpu0: 1360x768, 32bpp > > > wsdisplay0 at amdgpu0 mux 1: console (std, vt100 emulation), using wskbd0 > > > wskbd1: connecting to wsdisplay0 > > > wsdisplay0: screen 1-5 added (std, vt100 emulation) > > > > The "stub" messages make me wonder if we're hitting some > > not-yet-implemented features. That "failed to retrieve minimum clocks" > > has been seen on Linux as well, and there it was related to PCI prefetch > > register programming. > > > > The machine you've got isn't much different to what I have at work > > actually: Rysen 7 1700 (so previous generation), and a RX550 video card > > (POLARIS12, maybe slightly newer?)… the machine is fitted with a RS-232 > > serial port so I might try a little experiment with a USB stick and see > > if I can install OpenBSD 6.6 to USB storage and try to reproduce the crash. > > -- > > Stuart Longland (aka Redhatter, VK4MSL) > > > > I haven't lost my mind... > > ...it's backed up on a tape somewhere. > > Hello Justin and Stuart, It is possible that the errors that I have found in /var/log/messages* are unrelated to the above. Thoughts? I have noticed that the freezes on this machine occur more quickly if I am working within tmux(1), as I was; at the time that the last freeze occurred. That may have been sheer coincidence. $ grep ERROR /var/log/messag* /var/log/messages:Mar 8 16:20:10 gx470 /bsd: [drm] *ERROR* ring gfx timeout, signaled seq=385, emitted seq=387 /var/log/messages:Mar 9 07:06:34 gx470 /bsd: [drm] *ERROR* Illegal register access in command stream /var/log/messages:Mar 9 07:06:44 gx470 /bsd: [drm] *ERROR* ring gfx timeout, signaled seq=794, emitted seq=796 My machine's last freeze occurred at the time of the last error in /var/log/messages. I am able to remotely login to this machine and access files when it is frozen, using kermit(1) and a USB to Serial adapter. The machine's /var/run/dmesg.boot can be found in my first email to this thread. Regards Avon -- aer
Re: Catastrophic
Yeah like Stuart said I need to reproduce the crash and get inside the machine when it’s in that state. To be continued. Best On Fri, Feb 28, 2020 at 7:42 PM Avon Robertson wrote: > On Sat, Feb 29, 2020 at 12:57:07AM +1000, Stuart Longland wrote: > > On 28/2/20 11:32 pm, Justin Noor wrote: > > > Thanks for offering to help and sorry for the delay - I got dragged > into a > > > work emergency. I finally managed to SCP my dmesg to a remote machine. > > > > Heh, no problems, these things happen. > > > > > As a refresher I have a 6.6 current machine that crashes when X is > running, > > > and almost instantly when Firefox is running - it runs fine without X. > The > > > machine becomes totally frozen - I have to perform a forced shutdown to > > > exit this state. The issue appears to be graphics related and is > > > inconsistent - sometimes it crashes immediately, other times it does > not. > > > > Sometimes it might be the way a particular graphics toolkit "tickles" > > the video hardware too. For instance FVWM uses libxcb for drawing > > graphics which means you're likely to be just working with 2D primitives. > > > > Then Firefox with its GTK+ back-end fires off a few RENDER extension > > requests to the X server and whoopsie! Down she goes! > > > > > There are indeed some "unknown product" messages related to my PCI > graphics > > > card in my dmesg, but I haven't been able to decipher them yet. Those > > > usually mean the device is not supported, but it is, and I'm sure I > have > > > the correct driver (amdgpu0). Previously I had no issues for months, > which > > > is why I suspected hardware failure. Admittedly I've been lucky with > > > graphics cards over the years, and don't know much about PCI. > > > > No issues for months running a previous version of OpenBSD or the same > > you're running now? > > > > One suggestion I made too was to maybe try setting up a serial console > > link… turns out the motherboard makers know how to tease: > > > > > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo > > > com0: probed fifo depth: 0 bytes > > > > That says there is a RS-232 port somewhere… so I had a look at the > handbook: > > > https://dlcdnets.asus.com/pub/ASUS/mb/SocketAM4/ROG_STRIX_B450-I_GAMING/E14337_ROG_STRIX_B450-I_GAMING_UM_PRINT.pdf > > > > They didn't wire it up to a pin header, which is annoying. > > > > On the video front, I did see this: > > > initializing kernel modesetting (POLARIS11 0x1002:0x67EF 0x1002:0x0B04 > > > 0xE5). > > > amdgpu_irq_add_domain: stub > > > amdgpu_device_resize_fb_bar: stub > > > amdgpu: [powerplay] Failed to retrieve minimum clocks. > > > amdgpu0: 1360x768, 32bpp > > > wsdisplay0 at amdgpu0 mux 1: console (std, vt100 emulation), using > wskbd0 > > > wskbd1: connecting to wsdisplay0 > > > wsdisplay0: screen 1-5 added (std, vt100 emulation) > > > > The "stub" messages make me wonder if we're hitting some > > not-yet-implemented features. That "failed to retrieve minimum clocks" > > has been seen on Linux as well, and there it was related to PCI prefetch > > register programming. > > > > The machine you've got isn't much different to what I have at work > > actually: Rysen 7 1700 (so previous generation), and a RX550 video card > > (POLARIS12, maybe slightly newer?)… the machine is fitted with a RS-232 > > serial port so I might try a little experiment with a USB stick and see > > if I can install OpenBSD 6.6 to USB storage and try to reproduce the > crash. > > -- > > Stuart Longland (aka Redhatter, VK4MSL) > > > > I haven't lost my mind... > > ...it's backed up on a tape somewhere. > > > > Hello Justin and Stuart, > > I hope the following may be of help in solving the cause of the crash. > > I have experienced a similar type of crash when using X on this machine > for approximately the last 6 weeks. Prior to this, X had been running on > this machine without apparent problems for 12 plus months. > > The only browser installed on this machine is lynx(1). My crashes have > been random with no recognised culprit at the time of the crash, which > usually occurred within 10 minutes of invoking startx(1). > > fvwm(1) is the only window manager installed on this machine. All my > crashes have required the machine to be powered off to regain control. > > This machine's graphics card was identified by it's vendor as a: > Sapphire Nitro+ RX580 8G GDDR5 Graphics Card 2X HDMI + 2X Display+DVI > Port. > This machine is connected to it's monitor using a Display Port cable. > > This machine has worked and is working without problems from a console, > with and without tmux(1). If multiple consoles are run at the same time > however, when exit(3) is invoked from one of them the time taken to > exit is sometimes longer than 10 seconds. This seems odd to me. > > Please find below the contents of this machine's /var/run/dmesg.boot. > > OpenBSD 6.6-current (GENERIC.MP) #0: Sun Feb 23 00:07:16 MST 2020 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENER
Re: Catastrophic
Awesome - thank you for your time and for the valuable information. That’s hilarious about the serial port. I’ll try plugging into a switch, reproducing the crash, and SSHing into it. I still haven’t tried the syslogd tip you mentioned either. It’s time for me to start learning more about X. Will be in touch. Regards On Fri, Feb 28, 2020 at 6:57 AM Stuart Longland wrote: > On 28/2/20 11:32 pm, Justin Noor wrote: > > Thanks for offering to help and sorry for the delay - I got dragged into > a > > work emergency. I finally managed to SCP my dmesg to a remote machine. > > Heh, no problems, these things happen. > > > As a refresher I have a 6.6 current machine that crashes when X is > running, > > and almost instantly when Firefox is running - it runs fine without X. > The > > machine becomes totally frozen - I have to perform a forced shutdown to > > exit this state. The issue appears to be graphics related and is > > inconsistent - sometimes it crashes immediately, other times it does not. > > Sometimes it might be the way a particular graphics toolkit "tickles" > the video hardware too. For instance FVWM uses libxcb for drawing > graphics which means you're likely to be just working with 2D primitives. > > Then Firefox with its GTK+ back-end fires off a few RENDER extension > requests to the X server and whoopsie! Down she goes! > > > There are indeed some "unknown product" messages related to my PCI > graphics > > card in my dmesg, but I haven't been able to decipher them yet. Those > > usually mean the device is not supported, but it is, and I'm sure I have > > the correct driver (amdgpu0). Previously I had no issues for months, > which > > is why I suspected hardware failure. Admittedly I've been lucky with > > graphics cards over the years, and don't know much about PCI. > > No issues for months running a previous version of OpenBSD or the same > you're running now? > > One suggestion I made too was to maybe try setting up a serial console > link… turns out the motherboard makers know how to tease: > > > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo > > com0: probed fifo depth: 0 bytes > > That says there is a RS-232 port somewhere… so I had a look at the > handbook: > > https://dlcdnets.asus.com/pub/ASUS/mb/SocketAM4/ROG_STRIX_B450-I_GAMING/E14337_ROG_STRIX_B450-I_GAMING_UM_PRINT.pdf > > They didn't wire it up to a pin header, which is annoying. > > On the video front, I did see this: > > initializing kernel modesetting (POLARIS11 0x1002:0x67EF 0x1002:0x0B04 > > 0xE5). > > amdgpu_irq_add_domain: stub > > amdgpu_device_resize_fb_bar: stub > > amdgpu: [powerplay] Failed to retrieve minimum clocks. > > amdgpu0: 1360x768, 32bpp > > wsdisplay0 at amdgpu0 mux 1: console (std, vt100 emulation), using wskbd0 > > wskbd1: connecting to wsdisplay0 > > wsdisplay0: screen 1-5 added (std, vt100 emulation) > > The "stub" messages make me wonder if we're hitting some > not-yet-implemented features. That "failed to retrieve minimum clocks" > has been seen on Linux as well, and there it was related to PCI prefetch > register programming. > > The machine you've got isn't much different to what I have at work > actually: Rysen 7 1700 (so previous generation), and a RX550 video card > (POLARIS12, maybe slightly newer?)… the machine is fitted with a RS-232 > serial port so I might try a little experiment with a USB stick and see > if I can install OpenBSD 6.6 to USB storage and try to reproduce the crash. > -- > Stuart Longland (aka Redhatter, VK4MSL) > > I haven't lost my mind... > ...it's backed up on a tape somewhere. >
Re: Catastrophic
On Sat, Feb 29, 2020 at 12:57:07AM +1000, Stuart Longland wrote: > On 28/2/20 11:32 pm, Justin Noor wrote: > > Thanks for offering to help and sorry for the delay - I got dragged into a > > work emergency. I finally managed to SCP my dmesg to a remote machine. > > Heh, no problems, these things happen. > > > As a refresher I have a 6.6 current machine that crashes when X is running, > > and almost instantly when Firefox is running - it runs fine without X. The > > machine becomes totally frozen - I have to perform a forced shutdown to > > exit this state. The issue appears to be graphics related and is > > inconsistent - sometimes it crashes immediately, other times it does not. > > Sometimes it might be the way a particular graphics toolkit "tickles" > the video hardware too. For instance FVWM uses libxcb for drawing > graphics which means you're likely to be just working with 2D primitives. > > Then Firefox with its GTK+ back-end fires off a few RENDER extension > requests to the X server and whoopsie! Down she goes! > > > There are indeed some "unknown product" messages related to my PCI graphics > > card in my dmesg, but I haven't been able to decipher them yet. Those > > usually mean the device is not supported, but it is, and I'm sure I have > > the correct driver (amdgpu0). Previously I had no issues for months, which > > is why I suspected hardware failure. Admittedly I've been lucky with > > graphics cards over the years, and don't know much about PCI. > > No issues for months running a previous version of OpenBSD or the same > you're running now? > > One suggestion I made too was to maybe try setting up a serial console > link… turns out the motherboard makers know how to tease: > > > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo > > com0: probed fifo depth: 0 bytes > > That says there is a RS-232 port somewhere… so I had a look at the handbook: > https://dlcdnets.asus.com/pub/ASUS/mb/SocketAM4/ROG_STRIX_B450-I_GAMING/E14337_ROG_STRIX_B450-I_GAMING_UM_PRINT.pdf > > They didn't wire it up to a pin header, which is annoying. > > On the video front, I did see this: > > initializing kernel modesetting (POLARIS11 0x1002:0x67EF 0x1002:0x0B04 > > 0xE5). > > amdgpu_irq_add_domain: stub > > amdgpu_device_resize_fb_bar: stub > > amdgpu: [powerplay] Failed to retrieve minimum clocks. > > amdgpu0: 1360x768, 32bpp > > wsdisplay0 at amdgpu0 mux 1: console (std, vt100 emulation), using wskbd0 > > wskbd1: connecting to wsdisplay0 > > wsdisplay0: screen 1-5 added (std, vt100 emulation) > > The "stub" messages make me wonder if we're hitting some > not-yet-implemented features. That "failed to retrieve minimum clocks" > has been seen on Linux as well, and there it was related to PCI prefetch > register programming. > > The machine you've got isn't much different to what I have at work > actually: Rysen 7 1700 (so previous generation), and a RX550 video card > (POLARIS12, maybe slightly newer?)… the machine is fitted with a RS-232 > serial port so I might try a little experiment with a USB stick and see > if I can install OpenBSD 6.6 to USB storage and try to reproduce the crash. > -- > Stuart Longland (aka Redhatter, VK4MSL) > > I haven't lost my mind... > ...it's backed up on a tape somewhere. > Hello Justin and Stuart, I hope the following may be of help in solving the cause of the crash. I have experienced a similar type of crash when using X on this machine for approximately the last 6 weeks. Prior to this, X had been running on this machine without apparent problems for 12 plus months. The only browser installed on this machine is lynx(1). My crashes have been random with no recognised culprit at the time of the crash, which usually occurred within 10 minutes of invoking startx(1). fvwm(1) is the only window manager installed on this machine. All my crashes have required the machine to be powered off to regain control. This machine's graphics card was identified by it's vendor as a: Sapphire Nitro+ RX580 8G GDDR5 Graphics Card 2X HDMI + 2X Display+DVI Port. This machine is connected to it's monitor using a Display Port cable. This machine has worked and is working without problems from a console, with and without tmux(1). If multiple consoles are run at the same time however, when exit(3) is invoked from one of them the time taken to exit is sometimes longer than 10 seconds. This seems odd to me. Please find below the contents of this machine's /var/run/dmesg.boot. OpenBSD 6.6-current (GENERIC.MP) #0: Sun Feb 23 00:07:16 MST 2020 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 68644982784 (65464MB) avail mem = 66551980032 (63468MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xe8980 (59 entries) bios0: vendor American Megatrends Inc. version "F1" date 03/01/2018 bios0: Gigabyte Technology Co., Ltd. X470 AORUS ULTRA GAMING acpi0 at bios0: ACPI 6.0 acpi0: sleep states S0 S3 S4
Re: Catastrophic
On 28/2/20 11:32 pm, Justin Noor wrote: > Thanks for offering to help and sorry for the delay - I got dragged into a > work emergency. I finally managed to SCP my dmesg to a remote machine. Heh, no problems, these things happen. > As a refresher I have a 6.6 current machine that crashes when X is running, > and almost instantly when Firefox is running - it runs fine without X. The > machine becomes totally frozen - I have to perform a forced shutdown to > exit this state. The issue appears to be graphics related and is > inconsistent - sometimes it crashes immediately, other times it does not. Sometimes it might be the way a particular graphics toolkit "tickles" the video hardware too. For instance FVWM uses libxcb for drawing graphics which means you're likely to be just working with 2D primitives. Then Firefox with its GTK+ back-end fires off a few RENDER extension requests to the X server and whoopsie! Down she goes! > There are indeed some "unknown product" messages related to my PCI graphics > card in my dmesg, but I haven't been able to decipher them yet. Those > usually mean the device is not supported, but it is, and I'm sure I have > the correct driver (amdgpu0). Previously I had no issues for months, which > is why I suspected hardware failure. Admittedly I've been lucky with > graphics cards over the years, and don't know much about PCI. No issues for months running a previous version of OpenBSD or the same you're running now? One suggestion I made too was to maybe try setting up a serial console link… turns out the motherboard makers know how to tease: > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo > com0: probed fifo depth: 0 bytes That says there is a RS-232 port somewhere… so I had a look at the handbook: https://dlcdnets.asus.com/pub/ASUS/mb/SocketAM4/ROG_STRIX_B450-I_GAMING/E14337_ROG_STRIX_B450-I_GAMING_UM_PRINT.pdf They didn't wire it up to a pin header, which is annoying. On the video front, I did see this: > initializing kernel modesetting (POLARIS11 0x1002:0x67EF 0x1002:0x0B04 > 0xE5). > amdgpu_irq_add_domain: stub > amdgpu_device_resize_fb_bar: stub > amdgpu: [powerplay] Failed to retrieve minimum clocks. > amdgpu0: 1360x768, 32bpp > wsdisplay0 at amdgpu0 mux 1: console (std, vt100 emulation), using wskbd0 > wskbd1: connecting to wsdisplay0 > wsdisplay0: screen 1-5 added (std, vt100 emulation) The "stub" messages make me wonder if we're hitting some not-yet-implemented features. That "failed to retrieve minimum clocks" has been seen on Linux as well, and there it was related to PCI prefetch register programming. The machine you've got isn't much different to what I have at work actually: Rysen 7 1700 (so previous generation), and a RX550 video card (POLARIS12, maybe slightly newer?)… the machine is fitted with a RS-232 serial port so I might try a little experiment with a USB stick and see if I can install OpenBSD 6.6 to USB storage and try to reproduce the crash. -- Stuart Longland (aka Redhatter, VK4MSL) I haven't lost my mind... ...it's backed up on a tape somewhere.
Re: Catastrophic
Thanks for offering to help and sorry for the delay - I got dragged into a work emergency. I finally managed to SCP my dmesg to a remote machine. As a refresher I have a 6.6 current machine that crashes when X is running, and almost instantly when Firefox is running - it runs fine without X. The machine becomes totally frozen - I have to perform a forced shutdown to exit this state. The issue appears to be graphics related and is inconsistent - sometimes it crashes immediately, other times it does not. There are indeed some "unknown product" messages related to my PCI graphics card in my dmesg, but I haven't been able to decipher them yet. Those usually mean the device is not supported, but it is, and I'm sure I have the correct driver (amdgpu0). Previously I had no issues for months, which is why I suspected hardware failure. Admittedly I've been lucky with graphics cards over the years, and don't know much about PCI. dmesg: OpenBSD 6.6-current (GENERIC) #606: Fri Jan 31 19:02:51 MST 2020 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC real mem = 34268147712 (32680MB) avail mem = 33217200128 (31678MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xe68e0 (48 entries) bios0: vendor American Megatrends Inc. version "1001" date 09/27/2018 bios0: ASUSTeK COMPUTER INC. ROG STRIX B450-I GAMING acpi0 at bios0: ACPI 6.0 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SSDT CRAT CDIT SSDT MCFG SSDT HPET SSDT UEFI BGRT WPBT IVRS SSDT acpi0: wakeup devices GPP0(S4) GPP0(S4) GPP1(S4) GPP3(S4) GPP4(S4) GPP5(S4) GPP6(S4) GPP7(S4) GPP8(S4) X161(S4) GPP9(S4) X162(S4) GPPA(S4) GPPB(S4) GPPC(S4) GPPD(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 32 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: AMD Ryzen 5 2600 Six-Core Processor, 3394.18 MHz, 17-08-02 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 8-way L2 cache, 16MB 64b/line 16-way L3 cache cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, C-substates=1.1, IBE cpu at mainbus0: not configured cpu at mainbus0: not configured cpu at mainbus0: not configured cpu at mainbus0: not configured cpu at mainbus0: not configured cpu at mainbus0: not configured cpu at mainbus0: not configured cpu at mainbus0: not configured cpu at mainbus0: not configured cpu at mainbus0: not configured cpu at mainbus0: not configured ioapic0 at mainbus0: apid 13 pa 0xfec0, version 21, 24 pins ioapic1 at mainbus0: apid 14 pa 0xfec01000, version 21, 32 pins acpimcfg0 at acpi0 acpimcfg0: addr 0xf800, bus 0-63 acpihpet0 at acpi0: 14318180 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus -1 (GPP0) acpiprt2 at acpi0: bus -1 (GPP1) acpiprt3 at acpi0: bus -1 (GPP3) acpiprt4 at acpi0: bus -1 (GPP4) acpiprt5 at acpi0: bus -1 (GPP5) acpiprt6 at acpi0: bus -1 (GPP6) acpiprt7 at acpi0: bus -1 (GPP7) acpiprt8 at acpi0: bus 6 (GPP8) acpiprt9 at acpi0: bus -1 (GPP9) acpiprt10 at acpi0: bus -1 (GPPA) acpiprt11 at acpi0: bus -1 (GPPB) acpiprt12 at acpi0: bus -1 (GPPC) acpiprt13 at acpi0: bus -1 (GPPD) acpiprt14 at acpi0: bus -1 (GPPE) acpiprt15 at acpi0: bus -1 (GPPF) acpiprt16 at acpi0: bus 7 (GP17) acpiprt17 at acpi0: bus 8 (GP18) acpiprt18 at acpi0: bus 1 (GPP2) acpiec0 at acpi0 acpicpu0 at acpi0: C2(0@400 io@0x414), C1(0@1 mwait), PSS acpipci0 at acpi0 PCI0: 0x0010 0x0011 0x acpicmos0 at acpi0 acpibtn0 at acpi0: PWRB amdgpio0 at acpi0: GPIO uid 0 addr 0xfed81500/0x400 irq 7, 184 pins "AMDIF030" at acpi0 not configured "PNP0C14" at acpi0 not configured "PNP0C14" at acpi0 not configured "PNP0C14" at acpi0 not configured cpu0: 3394 MHz: speeds: 3400 2800 1550 MHz pci0 at mainbus0 bus 0 ksmn0 at pci0 dev 0 function 0 "AMD 17h Root Complex" rev 0x00 "AMD 17h IOMMU" rev 0x00 at pci0 dev 0 function 2 not configured pchb0 at pci0 dev 1 function 0 "AMD 17h PCIE" rev 0x00 ppb0 at pci0 dev 1 function 3 "AMD 17h PCIE" rev 0x00: msi pci1 at ppb0 bus 1 xhci0 at pci1 dev 0 function 0 vendor "AMD", unknown product 0x43d5 rev 0x01: msi, xHCI 1.10 usb0 at xhci0: USB revision 3.0 uhub0 at usb0 configuration 1 interface 0 "AMD xHCI root hub" rev 3.00/1.00 addr 1 ahci0 at pci1 dev 0 function 1 "AMD 400 Series AHCI" rev 0x01: msi, AHCI
Re: Catastrophic
Yes the machine runs without X. I can scp a copy of my dmesg to a remote machine and go from there. Will be in touch soon. Thank you. On Sun, Feb 9, 2020 at 3:06 PM Stuart Longland wrote: > On 27/1/20 11:59 pm, Justin Noor wrote: > > I am unable to send any log files or anything. I had to send this > > email from a different machine. I can take pictures of log files and > > transfer the information, but I'm not sure where to start. > > A `dmesg` before the crash would at least tell us whether there's > problematic hardware/drivers in use. Even though it's not taken at the > moment of the crash doesn't mean it's worthless. > > Has the machine got a serial port? Maybe you could hook that up to a > logging terminal emulator on another computer via a null-modem cable? > (It may need to be a PCI(e)-connected serial port rather than USB, not > many OSes support serial console over USB due to the complexities of USB > itself.) > > Maybe you could configure syslogd(8) to send its logs via UDP to a > syslog on another computer? It might not catch the very last log > messages, but maybe might capture enough? > -- > Stuart Longland (aka Redhatter, VK4MSL) > > I haven't lost my mind... > ...it's backed up on a tape somewhere. > >
Re: Catastrophic
On 27/1/20 11:59 pm, Justin Noor wrote: > I am unable to send any log files or anything. I had to send this > email from a different machine. I can take pictures of log files and > transfer the information, but I'm not sure where to start. A `dmesg` before the crash would at least tell us whether there's problematic hardware/drivers in use. Even though it's not taken at the moment of the crash doesn't mean it's worthless. Has the machine got a serial port? Maybe you could hook that up to a logging terminal emulator on another computer via a null-modem cable? (It may need to be a PCI(e)-connected serial port rather than USB, not many OSes support serial console over USB due to the complexities of USB itself.) Maybe you could configure syslogd(8) to send its logs via UDP to a syslog on another computer? It might not catch the very last log messages, but maybe might capture enough? -- Stuart Longland (aka Redhatter, VK4MSL) I haven't lost my mind... ...it's backed up on a tape somewhere.
Re: Catastrophic
On Mon, 27 Jan 2020 at 13:59, Justin Noor wrote: > > Hello community, > > I'm looking for any advice on how to troubleshoot some strange and > catastrophic behavior on my OpenBSD machine. Seemingly out of nowhere, it > started freezing to the extent that only a forced shutdown (holding down > the power button) gets me out of it. I suspect it's some kind of hardware > failure, but I'm not 100% sure. It crashes when xenodm is running. > Especially with firefox--it crashes instantly. If I disable xenodm it runs > fine. I am unable to send any log files or anything. I had to send this > email from a different machine. I can take pictures of log files and > transfer the information, but I'm not sure where to start. Any feedback > would be greatly appreciated. You should have old copies of messages in /var/log: oc@OpenBSD:~$ ls /var/log/messages* /var/log/messages /var/log/messages.1.gz /var/log/messages.0.gz /var/log/messages.2.gz -- Ottavio Caruso
Re: Catastrophic
A full dmesg would certainly help. Also, do you see anything in /var/log/messages? On Mon, Jan 27, 2020 at 4:01 PM Justin Noor wrote: > > Hello community, > > I'm looking for any advice on how to troubleshoot some strange and > catastrophic behavior on my OpenBSD machine. Seemingly out of nowhere, it > started freezing to the extent that only a forced shutdown (holding down > the power button) gets me out of it. I suspect it's some kind of hardware > failure, but I'm not 100% sure. It crashes when xenodm is running. > Especially with firefox--it crashes instantly. If I disable xenodm it runs > fine. I am unable to send any log files or anything. I had to send this > email from a different machine. I can take pictures of log files and > transfer the information, but I'm not sure where to start. Any feedback > would be greatly appreciated. > > Machine specs: > > Version: 6.6 Current (always up-to-date) > Architecture: amd64 > Kernel: '$ uname -a' OpenBSD myhost.myhost.com 6.6 GENERIC#601 amd64 > Chipset: AMD Ryzen 5 > GPU: Radeon RX 560 series, amdgpu0: msi > > Thank you, > > Justin Noor
Catastrophic
Hello community, I'm looking for any advice on how to troubleshoot some strange and catastrophic behavior on my OpenBSD machine. Seemingly out of nowhere, it started freezing to the extent that only a forced shutdown (holding down the power button) gets me out of it. I suspect it's some kind of hardware failure, but I'm not 100% sure. It crashes when xenodm is running. Especially with firefox--it crashes instantly. If I disable xenodm it runs fine. I am unable to send any log files or anything. I had to send this email from a different machine. I can take pictures of log files and transfer the information, but I'm not sure where to start. Any feedback would be greatly appreciated. Machine specs: Version: 6.6 Current (always up-to-date) Architecture: amd64 Kernel: '$ uname -a' OpenBSD myhost.myhost.com 6.6 GENERIC#601 amd64 Chipset: AMD Ryzen 5 GPU: Radeon RX 560 series, amdgpu0: msi Thank you, Justin Noor