Re: Catastrophic machine freezes - X related

2020-03-08 Thread Justin Noor
You’re using tmux with or without X? We’re getting different errors. Thus
far my errors are definitely X related.

Coincidentally I was just working on this. My machine crashed, and my logs
are showing:

rwsleep_nsec: Xorg[98908]: fsleep: trying to sleep zero nanoseconds

I’m looking into it as we speak.

On Sun, Mar 8, 2020 at 4:09 PM Avon Robertson  wrote:

> On Sat, Feb 29, 2020 at 07:41:59AM -0800, Justin Noor wrote:
> > Awesome - thank you for your time and for the valuable information.
> >
> > That’s hilarious about the serial port. I’ll try plugging into a switch,
> > reproducing the crash, and SSHing into it. I still haven’t tried the
> > syslogd tip you mentioned either. It’s time for me to start learning more
> > about X. Will be in touch.
> >
> > Regards
> >
> > On Fri, Feb 28, 2020 at 6:57 AM Stuart Longland <
> stua...@longlandclan.id.au>
> > wrote:
> >
> > > On 28/2/20 11:32 pm, Justin Noor wrote:
> > > > Thanks for offering to help and sorry for the delay - I got dragged
> into
> > > a
> > > > work emergency. I finally managed to SCP my dmesg to a remote
> machine.
> > >
> > > Heh, no problems, these things happen.
> > >
> > > > As a refresher I have a 6.6 current machine that crashes when X is
> > > running,
> > > > and almost instantly when Firefox is running - it runs fine without
> X.
> > > The
> > > > machine becomes totally frozen - I have to perform a forced shutdown
> to
> > > > exit this state. The issue appears to be graphics related and is
> > > > inconsistent - sometimes it crashes immediately, other times it does
> not.
> > >
> > > Sometimes it might be the way a particular graphics toolkit "tickles"
> > > the video hardware too.  For instance FVWM uses libxcb for drawing
> > > graphics which means you're likely to be just working with 2D
> primitives.
> > >
> > > Then Firefox with its GTK+ back-end fires off a few RENDER extension
> > > requests to the X server and whoopsie!  Down she goes!
> > >
> > > > There are indeed some "unknown product" messages related to my PCI
> > > graphics
> > > > card in my dmesg, but I haven't been able to decipher them yet. Those
> > > > usually mean the device is not supported, but it is, and I'm sure I
> have
> > > > the correct driver (amdgpu0). Previously I had no issues for months,
> > > which
> > > > is why I suspected hardware failure. Admittedly I've been lucky with
> > > > graphics cards over the years, and don't know much about PCI.
> > >
> > > No issues for months running a previous version of OpenBSD or the same
> > > you're running now?
> > >
> > > One suggestion I made too was to maybe try setting up a serial console
> > > link… turns out the motherboard makers know how to tease:
> > >
> > > > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> > > > com0: probed fifo depth: 0 bytes
> > >
> > > That says there is a RS-232 port somewhere… so I had a look at the
> > > handbook:
> > >
> > >
> https://dlcdnets.asus.com/pub/ASUS/mb/SocketAM4/ROG_STRIX_B450-I_GAMING/E14337_ROG_STRIX_B450-I_GAMING_UM_PRINT.pdf
> > >
> > > They didn't wire it up to a pin header, which is annoying.
> > >
> > > On the video front, I did see this:
> > > > initializing kernel modesetting (POLARIS11 0x1002:0x67EF
> 0x1002:0x0B04
> > > > 0xE5).
> > > > amdgpu_irq_add_domain: stub
> > > > amdgpu_device_resize_fb_bar: stub
> > > > amdgpu: [powerplay] Failed to retrieve minimum clocks.
> > > > amdgpu0: 1360x768, 32bpp
> > > > wsdisplay0 at amdgpu0 mux 1: console (std, vt100 emulation), using
> wskbd0
> > > > wskbd1: connecting to wsdisplay0
> > > > wsdisplay0: screen 1-5 added (std, vt100 emulation)
> > >
> > > The "stub" messages make me wonder if we're hitting some
> > > not-yet-implemented features.  That "failed to retrieve minimum clocks"
> > > has been seen on Linux as well, and there it was related to PCI
> prefetch
> > > register programming.
> > >
> > > The machine you've got isn't much different to what I have at work
> > > actually: Rysen 7 1700 (so previous generation), and a RX550 video card
> > > (POLARIS12, maybe slightly newer?)… the machine is fitted with a RS-232
> > > serial port so I might try a little experiment with a USB stick and see
> > > if I can install OpenBSD 6.6 to USB storage and try to reproduce the
> crash.
> > > --
> > > Stuart Longland (aka Redhatter, VK4MSL)
> > >
> > > I haven't lost my mind...
> > >   ...it's backed up on a tape somewhere.
> > >
>
> Hello Justin and Stuart,
>
> It is possible that the errors that I have found in /var/log/messages*
> are unrelated to the above. Thoughts?
>
> I have noticed that the freezes on this machine occur more quickly if I
> am working within tmux(1), as I was; at the time that the last freeze
> occurred. That may have been sheer coincidence.
>
> $ grep ERROR /var/log/messag*
> /var/log/messages:Mar  8 16:20:10 gx470 /bsd: [drm] *ERROR* ring gfx
> timeout, signaled seq=385, emitted seq=387
> /var/log/messages:Mar  9 07:06:34 gx470 /bsd: [drm] *ERROR* Illegal
> register access in 

Re: Catastrophic

2020-03-08 Thread Avon Robertson
On Sat, Feb 29, 2020 at 07:41:59AM -0800, Justin Noor wrote:
> Awesome - thank you for your time and for the valuable information.
> 
> That’s hilarious about the serial port. I’ll try plugging into a switch,
> reproducing the crash, and SSHing into it. I still haven’t tried the
> syslogd tip you mentioned either. It’s time for me to start learning more
> about X. Will be in touch.
> 
> Regards
> 
> On Fri, Feb 28, 2020 at 6:57 AM Stuart Longland 
> wrote:
> 
> > On 28/2/20 11:32 pm, Justin Noor wrote:
> > > Thanks for offering to help and sorry for the delay - I got dragged into
> > a
> > > work emergency. I finally managed to SCP my dmesg to a remote machine.
> >
> > Heh, no problems, these things happen.
> >
> > > As a refresher I have a 6.6 current machine that crashes when X is
> > running,
> > > and almost instantly when Firefox is running - it runs fine without X.
> > The
> > > machine becomes totally frozen - I have to perform a forced shutdown to
> > > exit this state. The issue appears to be graphics related and is
> > > inconsistent - sometimes it crashes immediately, other times it does not.
> >
> > Sometimes it might be the way a particular graphics toolkit "tickles"
> > the video hardware too.  For instance FVWM uses libxcb for drawing
> > graphics which means you're likely to be just working with 2D primitives.
> >
> > Then Firefox with its GTK+ back-end fires off a few RENDER extension
> > requests to the X server and whoopsie!  Down she goes!
> >
> > > There are indeed some "unknown product" messages related to my PCI
> > graphics
> > > card in my dmesg, but I haven't been able to decipher them yet. Those
> > > usually mean the device is not supported, but it is, and I'm sure I have
> > > the correct driver (amdgpu0). Previously I had no issues for months,
> > which
> > > is why I suspected hardware failure. Admittedly I've been lucky with
> > > graphics cards over the years, and don't know much about PCI.
> >
> > No issues for months running a previous version of OpenBSD or the same
> > you're running now?
> >
> > One suggestion I made too was to maybe try setting up a serial console
> > link… turns out the motherboard makers know how to tease:
> >
> > > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> > > com0: probed fifo depth: 0 bytes
> >
> > That says there is a RS-232 port somewhere… so I had a look at the
> > handbook:
> >
> > https://dlcdnets.asus.com/pub/ASUS/mb/SocketAM4/ROG_STRIX_B450-I_GAMING/E14337_ROG_STRIX_B450-I_GAMING_UM_PRINT.pdf
> >
> > They didn't wire it up to a pin header, which is annoying.
> >
> > On the video front, I did see this:
> > > initializing kernel modesetting (POLARIS11 0x1002:0x67EF 0x1002:0x0B04
> > > 0xE5).
> > > amdgpu_irq_add_domain: stub
> > > amdgpu_device_resize_fb_bar: stub
> > > amdgpu: [powerplay] Failed to retrieve minimum clocks.
> > > amdgpu0: 1360x768, 32bpp
> > > wsdisplay0 at amdgpu0 mux 1: console (std, vt100 emulation), using wskbd0
> > > wskbd1: connecting to wsdisplay0
> > > wsdisplay0: screen 1-5 added (std, vt100 emulation)
> >
> > The "stub" messages make me wonder if we're hitting some
> > not-yet-implemented features.  That "failed to retrieve minimum clocks"
> > has been seen on Linux as well, and there it was related to PCI prefetch
> > register programming.
> >
> > The machine you've got isn't much different to what I have at work
> > actually: Rysen 7 1700 (so previous generation), and a RX550 video card
> > (POLARIS12, maybe slightly newer?)… the machine is fitted with a RS-232
> > serial port so I might try a little experiment with a USB stick and see
> > if I can install OpenBSD 6.6 to USB storage and try to reproduce the crash.
> > --
> > Stuart Longland (aka Redhatter, VK4MSL)
> >
> > I haven't lost my mind...
> >   ...it's backed up on a tape somewhere.
> >

Hello Justin and Stuart,

It is possible that the errors that I have found in /var/log/messages*
are unrelated to the above. Thoughts?

I have noticed that the freezes on this machine occur more quickly if I
am working within tmux(1), as I was; at the time that the last freeze
occurred. That may have been sheer coincidence.

$ grep ERROR /var/log/messag*
/var/log/messages:Mar  8 16:20:10 gx470 /bsd: [drm] *ERROR* ring gfx timeout, 
signaled seq=385, emitted seq=387
/var/log/messages:Mar  9 07:06:34 gx470 /bsd: [drm] *ERROR* Illegal register 
access in command stream
/var/log/messages:Mar  9 07:06:44 gx470 /bsd: [drm] *ERROR* ring gfx timeout, 
signaled seq=794, emitted seq=796

My machine's last freeze occurred at the time of the last error in
/var/log/messages. I am able to remotely login to this machine and
access files when it is frozen, using kermit(1) and a USB to Serial
adapter. The machine's /var/run/dmesg.boot can be found in my first
email to this thread.

Regards Avon

-- 
aer



Re: Catastrophic

2020-02-29 Thread Justin Noor
Yeah like Stuart said I need to reproduce the crash and get inside the
machine when it’s in that state. To be continued.

Best

On Fri, Feb 28, 2020 at 7:42 PM Avon Robertson  wrote:

> On Sat, Feb 29, 2020 at 12:57:07AM +1000, Stuart Longland wrote:
> > On 28/2/20 11:32 pm, Justin Noor wrote:
> > > Thanks for offering to help and sorry for the delay - I got dragged
> into a
> > > work emergency. I finally managed to SCP my dmesg to a remote machine.
> >
> > Heh, no problems, these things happen.
> >
> > > As a refresher I have a 6.6 current machine that crashes when X is
> running,
> > > and almost instantly when Firefox is running - it runs fine without X.
> The
> > > machine becomes totally frozen - I have to perform a forced shutdown to
> > > exit this state. The issue appears to be graphics related and is
> > > inconsistent - sometimes it crashes immediately, other times it does
> not.
> >
> > Sometimes it might be the way a particular graphics toolkit "tickles"
> > the video hardware too.  For instance FVWM uses libxcb for drawing
> > graphics which means you're likely to be just working with 2D primitives.
> >
> > Then Firefox with its GTK+ back-end fires off a few RENDER extension
> > requests to the X server and whoopsie!  Down she goes!
> >
> > > There are indeed some "unknown product" messages related to my PCI
> graphics
> > > card in my dmesg, but I haven't been able to decipher them yet. Those
> > > usually mean the device is not supported, but it is, and I'm sure I
> have
> > > the correct driver (amdgpu0). Previously I had no issues for months,
> which
> > > is why I suspected hardware failure. Admittedly I've been lucky with
> > > graphics cards over the years, and don't know much about PCI.
> >
> > No issues for months running a previous version of OpenBSD or the same
> > you're running now?
> >
> > One suggestion I made too was to maybe try setting up a serial console
> > link… turns out the motherboard makers know how to tease:
> >
> > > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> > > com0: probed fifo depth: 0 bytes
> >
> > That says there is a RS-232 port somewhere… so I had a look at the
> handbook:
> >
> https://dlcdnets.asus.com/pub/ASUS/mb/SocketAM4/ROG_STRIX_B450-I_GAMING/E14337_ROG_STRIX_B450-I_GAMING_UM_PRINT.pdf
> >
> > They didn't wire it up to a pin header, which is annoying.
> >
> > On the video front, I did see this:
> > > initializing kernel modesetting (POLARIS11 0x1002:0x67EF 0x1002:0x0B04
> > > 0xE5).
> > > amdgpu_irq_add_domain: stub
> > > amdgpu_device_resize_fb_bar: stub
> > > amdgpu: [powerplay] Failed to retrieve minimum clocks.
> > > amdgpu0: 1360x768, 32bpp
> > > wsdisplay0 at amdgpu0 mux 1: console (std, vt100 emulation), using
> wskbd0
> > > wskbd1: connecting to wsdisplay0
> > > wsdisplay0: screen 1-5 added (std, vt100 emulation)
> >
> > The "stub" messages make me wonder if we're hitting some
> > not-yet-implemented features.  That "failed to retrieve minimum clocks"
> > has been seen on Linux as well, and there it was related to PCI prefetch
> > register programming.
> >
> > The machine you've got isn't much different to what I have at work
> > actually: Rysen 7 1700 (so previous generation), and a RX550 video card
> > (POLARIS12, maybe slightly newer?)… the machine is fitted with a RS-232
> > serial port so I might try a little experiment with a USB stick and see
> > if I can install OpenBSD 6.6 to USB storage and try to reproduce the
> crash.
> > --
> > Stuart Longland (aka Redhatter, VK4MSL)
> >
> > I haven't lost my mind...
> >   ...it's backed up on a tape somewhere.
> >
>
> Hello Justin and Stuart,
>
> I hope the following may be of help in solving the cause of the crash.
>
> I have experienced a similar type of crash when using X on this machine
> for approximately the last 6 weeks. Prior to this, X had been running on
> this machine without apparent problems for 12 plus months.
>
> The only browser installed on this machine is lynx(1). My crashes have
> been random with no recognised culprit at the time of the crash, which
> usually occurred within 10 minutes of invoking startx(1).
>
> fvwm(1) is the only window manager installed on this machine. All my
> crashes have required the machine to be powered off to regain control.
>
> This machine's graphics card was identified by it's vendor as a:
>   Sapphire Nitro+ RX580 8G GDDR5 Graphics Card 2X HDMI + 2X Display+DVI
>   Port.
> This machine is connected to it's monitor using a Display Port cable.
>
> This machine has worked and is working without problems from a console,
> with and without tmux(1). If multiple consoles are run at the same time
> however, when exit(3) is invoked from one of them the time taken to
> exit is sometimes longer than 10 seconds. This seems odd to me.
>
> Please find below the contents of this machine's /var/run/dmesg.boot.
>
> OpenBSD 6.6-current (GENERIC.MP) #0: Sun Feb 23 00:07:16 MST 2020
> 

Re: Catastrophic

2020-02-29 Thread Justin Noor
Awesome - thank you for your time and for the valuable information.

That’s hilarious about the serial port. I’ll try plugging into a switch,
reproducing the crash, and SSHing into it. I still haven’t tried the
syslogd tip you mentioned either. It’s time for me to start learning more
about X. Will be in touch.

Regards

On Fri, Feb 28, 2020 at 6:57 AM Stuart Longland 
wrote:

> On 28/2/20 11:32 pm, Justin Noor wrote:
> > Thanks for offering to help and sorry for the delay - I got dragged into
> a
> > work emergency. I finally managed to SCP my dmesg to a remote machine.
>
> Heh, no problems, these things happen.
>
> > As a refresher I have a 6.6 current machine that crashes when X is
> running,
> > and almost instantly when Firefox is running - it runs fine without X.
> The
> > machine becomes totally frozen - I have to perform a forced shutdown to
> > exit this state. The issue appears to be graphics related and is
> > inconsistent - sometimes it crashes immediately, other times it does not.
>
> Sometimes it might be the way a particular graphics toolkit "tickles"
> the video hardware too.  For instance FVWM uses libxcb for drawing
> graphics which means you're likely to be just working with 2D primitives.
>
> Then Firefox with its GTK+ back-end fires off a few RENDER extension
> requests to the X server and whoopsie!  Down she goes!
>
> > There are indeed some "unknown product" messages related to my PCI
> graphics
> > card in my dmesg, but I haven't been able to decipher them yet. Those
> > usually mean the device is not supported, but it is, and I'm sure I have
> > the correct driver (amdgpu0). Previously I had no issues for months,
> which
> > is why I suspected hardware failure. Admittedly I've been lucky with
> > graphics cards over the years, and don't know much about PCI.
>
> No issues for months running a previous version of OpenBSD or the same
> you're running now?
>
> One suggestion I made too was to maybe try setting up a serial console
> link… turns out the motherboard makers know how to tease:
>
> > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> > com0: probed fifo depth: 0 bytes
>
> That says there is a RS-232 port somewhere… so I had a look at the
> handbook:
>
> https://dlcdnets.asus.com/pub/ASUS/mb/SocketAM4/ROG_STRIX_B450-I_GAMING/E14337_ROG_STRIX_B450-I_GAMING_UM_PRINT.pdf
>
> They didn't wire it up to a pin header, which is annoying.
>
> On the video front, I did see this:
> > initializing kernel modesetting (POLARIS11 0x1002:0x67EF 0x1002:0x0B04
> > 0xE5).
> > amdgpu_irq_add_domain: stub
> > amdgpu_device_resize_fb_bar: stub
> > amdgpu: [powerplay] Failed to retrieve minimum clocks.
> > amdgpu0: 1360x768, 32bpp
> > wsdisplay0 at amdgpu0 mux 1: console (std, vt100 emulation), using wskbd0
> > wskbd1: connecting to wsdisplay0
> > wsdisplay0: screen 1-5 added (std, vt100 emulation)
>
> The "stub" messages make me wonder if we're hitting some
> not-yet-implemented features.  That "failed to retrieve minimum clocks"
> has been seen on Linux as well, and there it was related to PCI prefetch
> register programming.
>
> The machine you've got isn't much different to what I have at work
> actually: Rysen 7 1700 (so previous generation), and a RX550 video card
> (POLARIS12, maybe slightly newer?)… the machine is fitted with a RS-232
> serial port so I might try a little experiment with a USB stick and see
> if I can install OpenBSD 6.6 to USB storage and try to reproduce the crash.
> --
> Stuart Longland (aka Redhatter, VK4MSL)
>
> I haven't lost my mind...
>   ...it's backed up on a tape somewhere.
>


Re: Catastrophic

2020-02-28 Thread Avon Robertson
On Sat, Feb 29, 2020 at 12:57:07AM +1000, Stuart Longland wrote:
> On 28/2/20 11:32 pm, Justin Noor wrote:
> > Thanks for offering to help and sorry for the delay - I got dragged into a
> > work emergency. I finally managed to SCP my dmesg to a remote machine.
> 
> Heh, no problems, these things happen.
> 
> > As a refresher I have a 6.6 current machine that crashes when X is running,
> > and almost instantly when Firefox is running - it runs fine without X. The
> > machine becomes totally frozen - I have to perform a forced shutdown to
> > exit this state. The issue appears to be graphics related and is
> > inconsistent - sometimes it crashes immediately, other times it does not.
> 
> Sometimes it might be the way a particular graphics toolkit "tickles"
> the video hardware too.  For instance FVWM uses libxcb for drawing
> graphics which means you're likely to be just working with 2D primitives.
> 
> Then Firefox with its GTK+ back-end fires off a few RENDER extension
> requests to the X server and whoopsie!  Down she goes!
> 
> > There are indeed some "unknown product" messages related to my PCI graphics
> > card in my dmesg, but I haven't been able to decipher them yet. Those
> > usually mean the device is not supported, but it is, and I'm sure I have
> > the correct driver (amdgpu0). Previously I had no issues for months, which
> > is why I suspected hardware failure. Admittedly I've been lucky with
> > graphics cards over the years, and don't know much about PCI.
> 
> No issues for months running a previous version of OpenBSD or the same
> you're running now?
> 
> One suggestion I made too was to maybe try setting up a serial console
> link… turns out the motherboard makers know how to tease:
> 
> > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> > com0: probed fifo depth: 0 bytes
> 
> That says there is a RS-232 port somewhere… so I had a look at the handbook:
> https://dlcdnets.asus.com/pub/ASUS/mb/SocketAM4/ROG_STRIX_B450-I_GAMING/E14337_ROG_STRIX_B450-I_GAMING_UM_PRINT.pdf
> 
> They didn't wire it up to a pin header, which is annoying.
> 
> On the video front, I did see this:
> > initializing kernel modesetting (POLARIS11 0x1002:0x67EF 0x1002:0x0B04
> > 0xE5).
> > amdgpu_irq_add_domain: stub
> > amdgpu_device_resize_fb_bar: stub
> > amdgpu: [powerplay] Failed to retrieve minimum clocks.
> > amdgpu0: 1360x768, 32bpp
> > wsdisplay0 at amdgpu0 mux 1: console (std, vt100 emulation), using wskbd0
> > wskbd1: connecting to wsdisplay0
> > wsdisplay0: screen 1-5 added (std, vt100 emulation)
> 
> The "stub" messages make me wonder if we're hitting some
> not-yet-implemented features.  That "failed to retrieve minimum clocks"
> has been seen on Linux as well, and there it was related to PCI prefetch
> register programming.
> 
> The machine you've got isn't much different to what I have at work
> actually: Rysen 7 1700 (so previous generation), and a RX550 video card
> (POLARIS12, maybe slightly newer?)… the machine is fitted with a RS-232
> serial port so I might try a little experiment with a USB stick and see
> if I can install OpenBSD 6.6 to USB storage and try to reproduce the crash.
> -- 
> Stuart Longland (aka Redhatter, VK4MSL)
> 
> I haven't lost my mind...
>   ...it's backed up on a tape somewhere.
> 

Hello Justin and Stuart,

I hope the following may be of help in solving the cause of the crash.

I have experienced a similar type of crash when using X on this machine
for approximately the last 6 weeks. Prior to this, X had been running on
this machine without apparent problems for 12 plus months.

The only browser installed on this machine is lynx(1). My crashes have
been random with no recognised culprit at the time of the crash, which
usually occurred within 10 minutes of invoking startx(1).

fvwm(1) is the only window manager installed on this machine. All my
crashes have required the machine to be powered off to regain control.

This machine's graphics card was identified by it's vendor as a:
  Sapphire Nitro+ RX580 8G GDDR5 Graphics Card 2X HDMI + 2X Display+DVI
  Port.
This machine is connected to it's monitor using a Display Port cable.

This machine has worked and is working without problems from a console,
with and without tmux(1). If multiple consoles are run at the same time
however, when exit(3) is invoked from one of them the time taken to
exit is sometimes longer than 10 seconds. This seems odd to me.

Please find below the contents of this machine's /var/run/dmesg.boot.

OpenBSD 6.6-current (GENERIC.MP) #0: Sun Feb 23 00:07:16 MST 2020
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 68644982784 (65464MB)
avail mem = 66551980032 (63468MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xe8980 (59 entries)
bios0: vendor American Megatrends Inc. version "F1" date 03/01/2018
bios0: Gigabyte Technology Co., Ltd. X470 AORUS ULTRA GAMING
acpi0 at bios0: ACPI 6.0
acpi0: sleep states S0 S3 

Re: Catastrophic

2020-02-28 Thread Stuart Longland
On 28/2/20 11:32 pm, Justin Noor wrote:
> Thanks for offering to help and sorry for the delay - I got dragged into a
> work emergency. I finally managed to SCP my dmesg to a remote machine.

Heh, no problems, these things happen.

> As a refresher I have a 6.6 current machine that crashes when X is running,
> and almost instantly when Firefox is running - it runs fine without X. The
> machine becomes totally frozen - I have to perform a forced shutdown to
> exit this state. The issue appears to be graphics related and is
> inconsistent - sometimes it crashes immediately, other times it does not.

Sometimes it might be the way a particular graphics toolkit "tickles"
the video hardware too.  For instance FVWM uses libxcb for drawing
graphics which means you're likely to be just working with 2D primitives.

Then Firefox with its GTK+ back-end fires off a few RENDER extension
requests to the X server and whoopsie!  Down she goes!

> There are indeed some "unknown product" messages related to my PCI graphics
> card in my dmesg, but I haven't been able to decipher them yet. Those
> usually mean the device is not supported, but it is, and I'm sure I have
> the correct driver (amdgpu0). Previously I had no issues for months, which
> is why I suspected hardware failure. Admittedly I've been lucky with
> graphics cards over the years, and don't know much about PCI.

No issues for months running a previous version of OpenBSD or the same
you're running now?

One suggestion I made too was to maybe try setting up a serial console
link… turns out the motherboard makers know how to tease:

> com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> com0: probed fifo depth: 0 bytes

That says there is a RS-232 port somewhere… so I had a look at the handbook:
https://dlcdnets.asus.com/pub/ASUS/mb/SocketAM4/ROG_STRIX_B450-I_GAMING/E14337_ROG_STRIX_B450-I_GAMING_UM_PRINT.pdf

They didn't wire it up to a pin header, which is annoying.

On the video front, I did see this:
> initializing kernel modesetting (POLARIS11 0x1002:0x67EF 0x1002:0x0B04
> 0xE5).
> amdgpu_irq_add_domain: stub
> amdgpu_device_resize_fb_bar: stub
> amdgpu: [powerplay] Failed to retrieve minimum clocks.
> amdgpu0: 1360x768, 32bpp
> wsdisplay0 at amdgpu0 mux 1: console (std, vt100 emulation), using wskbd0
> wskbd1: connecting to wsdisplay0
> wsdisplay0: screen 1-5 added (std, vt100 emulation)

The "stub" messages make me wonder if we're hitting some
not-yet-implemented features.  That "failed to retrieve minimum clocks"
has been seen on Linux as well, and there it was related to PCI prefetch
register programming.

The machine you've got isn't much different to what I have at work
actually: Rysen 7 1700 (so previous generation), and a RX550 video card
(POLARIS12, maybe slightly newer?)… the machine is fitted with a RS-232
serial port so I might try a little experiment with a USB stick and see
if I can install OpenBSD 6.6 to USB storage and try to reproduce the crash.
-- 
Stuart Longland (aka Redhatter, VK4MSL)

I haven't lost my mind...
  ...it's backed up on a tape somewhere.



Re: Catastrophic

2020-02-28 Thread Justin Noor
Thanks for offering to help and sorry for the delay - I got dragged into a
work emergency. I finally managed to SCP my dmesg to a remote machine.

As a refresher I have a 6.6 current machine that crashes when X is running,
and almost instantly when Firefox is running - it runs fine without X. The
machine becomes totally frozen - I have to perform a forced shutdown to
exit this state. The issue appears to be graphics related and is
inconsistent - sometimes it crashes immediately, other times it does not.
There are indeed some "unknown product" messages related to my PCI graphics
card in my dmesg, but I haven't been able to decipher them yet. Those
usually mean the device is not supported, but it is, and I'm sure I have
the correct driver (amdgpu0). Previously I had no issues for months, which
is why I suspected hardware failure. Admittedly I've been lucky with
graphics cards over the years, and don't know much about PCI.

dmesg:

OpenBSD 6.6-current (GENERIC) #606: Fri Jan 31 19:02:51 MST 2020
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
real mem = 34268147712 (32680MB)
avail mem = 33217200128 (31678MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xe68e0 (48 entries)
bios0: vendor American Megatrends Inc. version "1001" date 09/27/2018
bios0: ASUSTeK COMPUTER INC. ROG STRIX B450-I GAMING
acpi0 at bios0: ACPI 6.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SSDT CRAT CDIT SSDT MCFG SSDT
HPET SSDT UEFI BGRT WPBT IVRS SSDT
acpi0: wakeup devices GPP0(S4) GPP0(S4) GPP1(S4) GPP3(S4) GPP4(S4) GPP5(S4)
GPP6(S4) GPP7(S4) GPP8(S4) X161(S4) GPP9(S4) X162(S4) GPPA(S4) GPPB(S4)
GPPC(S4) GPPD(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Ryzen 5 2600 Six-Core Processor, 3394.18 MHz, 17-08-02
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB
64b/line 8-way L2 cache, 16MB 64b/line 16-way L3 cache
cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully
associative
cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully
associative
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=1.1, IBE
cpu at mainbus0: not configured
cpu at mainbus0: not configured
cpu at mainbus0: not configured
cpu at mainbus0: not configured
cpu at mainbus0: not configured
cpu at mainbus0: not configured
cpu at mainbus0: not configured
cpu at mainbus0: not configured
cpu at mainbus0: not configured
cpu at mainbus0: not configured
cpu at mainbus0: not configured
ioapic0 at mainbus0: apid 13 pa 0xfec0, version 21, 24 pins
ioapic1 at mainbus0: apid 14 pa 0xfec01000, version 21, 32 pins
acpimcfg0 at acpi0
acpimcfg0: addr 0xf800, bus 0-63
acpihpet0 at acpi0: 14318180 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (GPP0)
acpiprt2 at acpi0: bus -1 (GPP1)
acpiprt3 at acpi0: bus -1 (GPP3)
acpiprt4 at acpi0: bus -1 (GPP4)
acpiprt5 at acpi0: bus -1 (GPP5)
acpiprt6 at acpi0: bus -1 (GPP6)
acpiprt7 at acpi0: bus -1 (GPP7)
acpiprt8 at acpi0: bus 6 (GPP8)
acpiprt9 at acpi0: bus -1 (GPP9)
acpiprt10 at acpi0: bus -1 (GPPA)
acpiprt11 at acpi0: bus -1 (GPPB)
acpiprt12 at acpi0: bus -1 (GPPC)
acpiprt13 at acpi0: bus -1 (GPPD)
acpiprt14 at acpi0: bus -1 (GPPE)
acpiprt15 at acpi0: bus -1 (GPPF)
acpiprt16 at acpi0: bus 7 (GP17)
acpiprt17 at acpi0: bus 8 (GP18)
acpiprt18 at acpi0: bus 1 (GPP2)
acpiec0 at acpi0
acpicpu0 at acpi0: C2(0@400 io@0x414), C1(0@1 mwait), PSS
acpipci0 at acpi0 PCI0: 0x0010 0x0011 0x
acpicmos0 at acpi0
acpibtn0 at acpi0: PWRB
amdgpio0 at acpi0: GPIO uid 0 addr 0xfed81500/0x400 irq 7, 184 pins
"AMDIF030" at acpi0 not configured
"PNP0C14" at acpi0 not configured
"PNP0C14" at acpi0 not configured
"PNP0C14" at acpi0 not configured
cpu0: 3394 MHz: speeds: 3400 2800 1550 MHz
pci0 at mainbus0 bus 0
ksmn0 at pci0 dev 0 function 0 "AMD 17h Root Complex" rev 0x00
"AMD 17h IOMMU" rev 0x00 at pci0 dev 0 function 2 not configured
pchb0 at pci0 dev 1 function 0 "AMD 17h PCIE" rev 0x00
ppb0 at pci0 dev 1 function 3 "AMD 17h PCIE" rev 0x00: msi
pci1 at ppb0 bus 1
xhci0 at pci1 dev 0 function 0 vendor "AMD", unknown product 0x43d5 rev
0x01: msi, xHCI 1.10
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 configuration 1 interface 0 "AMD xHCI root hub" rev 3.00/1.00
addr 1
ahci0 at pci1 dev 0 function 1 "AMD 400 Series AHCI" rev 0x01: msi, 

Re: Catastrophic

2020-02-11 Thread Justin Noor
Yes the machine runs without X. I can scp a copy of my dmesg to a remote
machine and go from there. Will be in touch soon. Thank you.

On Sun, Feb 9, 2020 at 3:06 PM Stuart Longland 
wrote:

> On 27/1/20 11:59 pm, Justin Noor wrote:
> > I am unable to send any log files or anything. I had to send this
> > email from a different machine. I can take pictures of log files and
> > transfer the information, but I'm not sure where to start.
>
> A `dmesg` before the crash would at least tell us whether there's
> problematic hardware/drivers in use.  Even though it's not taken at the
> moment of the crash doesn't mean it's worthless.
>
> Has the machine got a serial port?  Maybe you could hook that up to a
> logging terminal emulator on another computer via a null-modem cable?
> (It may need to be a PCI(e)-connected serial port rather than USB, not
> many OSes support serial console over USB due to the complexities of USB
> itself.)
>
> Maybe you could configure syslogd(8) to send its logs via UDP to a
> syslog on another computer?  It might not catch the very last log
> messages, but maybe might capture enough?
> --
> Stuart Longland (aka Redhatter, VK4MSL)
>
> I haven't lost my mind...
>   ...it's backed up on a tape somewhere.
>
>


Re: Catastrophic

2020-02-09 Thread Stuart Longland
On 27/1/20 11:59 pm, Justin Noor wrote:
> I am unable to send any log files or anything. I had to send this
> email from a different machine. I can take pictures of log files and
> transfer the information, but I'm not sure where to start.

A `dmesg` before the crash would at least tell us whether there's
problematic hardware/drivers in use.  Even though it's not taken at the
moment of the crash doesn't mean it's worthless.

Has the machine got a serial port?  Maybe you could hook that up to a
logging terminal emulator on another computer via a null-modem cable?
(It may need to be a PCI(e)-connected serial port rather than USB, not
many OSes support serial console over USB due to the complexities of USB
itself.)

Maybe you could configure syslogd(8) to send its logs via UDP to a
syslog on another computer?  It might not catch the very last log
messages, but maybe might capture enough?
-- 
Stuart Longland (aka Redhatter, VK4MSL)

I haven't lost my mind...
  ...it's backed up on a tape somewhere.



Re: Catastrophic

2020-01-27 Thread Ottavio Caruso
On Mon, 27 Jan 2020 at 13:59, Justin Noor  wrote:
>
> Hello community,
>
> I'm looking for any advice on how to troubleshoot some strange and
> catastrophic behavior on my OpenBSD machine. Seemingly out of nowhere, it
> started freezing to the extent that only a forced shutdown (holding down
> the power button) gets me out of it. I suspect it's some kind of hardware
> failure, but I'm not 100% sure. It crashes when xenodm is running.
> Especially with firefox--it crashes instantly. If I disable xenodm it runs
> fine. I am unable to send any log files or anything. I had to send this
> email from a different machine. I can take pictures of log files and
> transfer the information, but I'm not sure where to start. Any feedback
> would be greatly appreciated.

You should have old copies of messages in /var/log:

oc@OpenBSD:~$ ls /var/log/messages*
/var/log/messages  /var/log/messages.1.gz
/var/log/messages.0.gz /var/log/messages.2.gz



-- 
Ottavio Caruso



Re: Catastrophic

2020-01-27 Thread Rares Aioanei
A full dmesg would certainly help. Also, do you see anything in
/var/log/messages?

On Mon, Jan 27, 2020 at 4:01 PM Justin Noor  wrote:
>
> Hello community,
>
> I'm looking for any advice on how to troubleshoot some strange and
> catastrophic behavior on my OpenBSD machine. Seemingly out of nowhere, it
> started freezing to the extent that only a forced shutdown (holding down
> the power button) gets me out of it. I suspect it's some kind of hardware
> failure, but I'm not 100% sure. It crashes when xenodm is running.
> Especially with firefox--it crashes instantly. If I disable xenodm it runs
> fine. I am unable to send any log files or anything. I had to send this
> email from a different machine. I can take pictures of log files and
> transfer the information, but I'm not sure where to start. Any feedback
> would be greatly appreciated.
>
> Machine specs:
>
> Version: 6.6 Current (always up-to-date)
> Architecture: amd64
> Kernel: '$ uname -a' OpenBSD myhost.myhost.com 6.6 GENERIC#601 amd64
> Chipset: AMD Ryzen 5
> GPU: Radeon RX 560 series, amdgpu0: msi
>
> Thank you,
>
> Justin Noor