Re: Add Product ID 0xc158 make failure

2022-02-02 Thread Avon Robertson
On Wed, Feb 02, 2022 at 12:29:49PM -, Stuart Henderson wrote:
> On 2022-02-02, Stuart Henderson  wrote:
> > On 2022-02-02, Avon Robertson  wrote:
> >> I am trying to add a new PCI product ID (0xc158) for a serial interface
> >> interface card to /usr/src/sys/dev/pci/pcidevs.
> >>
> >> For some reason e.g.: an omission or error due to something I should
> >> [not] have done; make will not execute as shown below with other
> >> relevant information.
> >>
> >> I would appreciate some guidance and input from anyone who can identify
> >> my omission or error.
> >
> > Not sure what is up here, for what you're trying to do it should just be
> > a case of adding to pcidevs and running "make". Maybe something in mk.conf
> > or your environment but I don't know what it could be. Try running the awk
> > command by hand and see if that gives any clues?
> >
> >
> > I've tried to get a similar device working before and didn't quite manage
> > it, but I'll give a few hints from what I worked out, unless there were
> > some changes to puc(4) in the meantime which I missed then adding the device
> > id is just the start.
> >
> >> $ fgrep -C -e c158 pcidevs 
> >> product OXFORD2 OXPCIE952   0xc110  OXPCIE952 Parallel
> >> product OXFORD2 OXPCIE952S  0xc120  OXPCIE952 Serial
> >> product OXFORD2 OXPCIE952S_10xc158  OXPCIE952 Serial
> >
> > FAAIK the device you added is the same IC as 0xc120 but is configured to use
> > the native uart rather than the legacy uart (mapped to memory rather
> > than io space); the mode is usually set by either jumpers or SMD resistors
> > depending on the card 
> > (http://www.baddinsbits.altervista.org/pcie952mod.html)
> >
> > The device I tried to get working was OXPCIE954 (configured to native
> > mode) and I had trouble getting the uart working at the right speed,
> > see https://marc.info/?l=openbsd-tech=135068369817918=2 and
> > the other thread referenced in there.
> >
> > There are some more recent changes for Linux relating to configuring
> > speeds on this family of ICs which are probably worth looking at
> > https://lore.kernel.org/all/alpine.deb.2.21.2107131504270.9...@angie.orcam.me.uk/T/
> >
> >
> 
> Also https://marc.info/?l=openbsd-tech=139588720431286=2
> 
> -- 
> Please keep replies on the mailing list.
> 

Thank you for the information Stuart. I will send an email in a few
days after applying your information.

Regards Avon



Add Product ID 0xc158 make failure

2022-02-01 Thread Avon Robertson
Hello,

I am trying to add a new PCI product ID (0xc158) for a serial interface
interface card to /usr/src/sys/dev/pci/pcidevs.

For some reason e.g.: an omission or error due to something I should
[not] have done; make will not execute as shown below with other
relevant information.

I would appreciate some guidance and input from anyone who can identify
my omission or error.

A dmesg is not included to keep this short, but I am able to provide one 
if it is required to identify the reason that make fails. 

$ sysctl kern.version
kern.version=OpenBSD 7.0-current (GENERIC.MP) #298: Mon Jan 31 13:42:43 MST 2022
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

$ pwd
/usr/src/sys/dev/pci

$ cat Makefile
#   $OpenBSD: Makefile,v 1.4 1996/10/14 09:01:34 deraadt Exp $
#   $NetBSD: Makefile,v 1.1 1995/06/18 01:07:04 cgd Exp $

AWK=awk

pcidevs.h pcidevs_data.h: pcidevs devlist2h.awk
/bin/rm -f pcidevs.h pcidevs_data.h
${AWK} -f devlist2h.awk pcidevs

$ make
/bin/rm -f pcidevs.h pcidevs_data.h
awk -f devlist2h.awk pcidevs
awk: can't open file devlist2h.awk
 source line number 1 source file devlist2h.awk
*** Error 2 in /usr/src/sys/dev/pci (Makefile:8 'pcidevs.h')

$ head -n 3 devlist2h.awk
#! /usr/bin/awk -f
#   $OpenBSD: devlist2h.awk,v 1.8 2007/02/21 13:17:28 deraadt Exp $
#   $NetBSD: devlist2h.awk,v 1.2 1996/01/22 21:08:09 cgd Exp $

$ ls -lao pcidevs* devlist*
-rw-rw-r--  1 aer  wsrc  -   5934 Feb  1 16:36 devlist2h.awk
-rw-rw-r--  1 aer  wsrc  - 395169 Feb  1 15:32 pcidevs
-rw-rw-r--  1 aer  wsrc  - 83 Jan 27 18:48 pcidevs.h
-rw-rw-r--  1 aer  wsrc  - 687283 Jan 27 18:48 pcidevs_data.h

$ fgrep -e wsrc /etc/group
wsrc:*:9:aer

$ fgrep -C -e c158 pcidevs 
product OXFORD2 OXPCIE952   0xc110  OXPCIE952 Parallel
product OXFORD2 OXPCIE952S  0xc120  OXPCIE952 Serial
product OXFORD2 OXPCIE952S_10xc158  OXPCIE952 Serial

/* Parallels products */

$ dmesg | grep -e Oxford
vendor "Oxford", unknown product 0xc158 (class communications subclass
serial, rev 0x00) at pci4 dev 0 function 0 not configured

As root:
# cd /var/log
# fgrep -R -e devlist2h.awk -e make *

# cd

# pcidump -v 4:0:0
 4:0:0: Oxford unknown
0x: Vendor ID: 1415, Product ID: c158
0x0004: Command: , Status: 0010
0x0008: Class: 07 Communications, Subclass: 00 Serial,
Interface: 02, Revision: 00
0x000c: BIST: 00, Header Type: 00, Latency Timer: 00,
Cache Line Size: 10
0x0010: BAR mem 32bit addr: 0xfc80/0x4000
0x0014: BAR mem 32bit addr: 0xfc60/0x0020
0x0018: BAR mem 32bit addr: 0xfc40/0x0020
0x001c: BAR empty ()
0x0020: BAR empty ()
0x0024: BAR empty ()
0x0028: Cardbus CIS: 
0x002c: Subsystem Vendor ID: 1415 Product ID: c158
0x0030: Expansion ROM Base Address: 
0x0038: 
0x003c: Interrupt Pin: 01 Line: 0b Min Gnt: 00 Max Lat: 00
0x0040: Capability 0x01: Power Management
State: D0
0x0070: Capability 0x10: PCI Express
Max Payload Size: 128 / 128 bytes
Max Read Request Size: 512 bytes
Link Speed: 2.5 / 2.5 GT/s
Link Width: x1 / x1
0x0100: Enhanced Capability 0x03: Device Serial Number
Serial Number: 0030e0000150
0x0110: Enhanced Capability 0x04: Power Budgeting
0x00b0: Capability 0x11: Extended Message Signalled Interrupts (MSI-X)
Enabled: no; table size 16 (BAR 1:1781760)

Regards,
   Avon

-- 
aer



Re: drmfreeze

2021-10-22 Thread Avon Robertson
On Fri, Oct 22, 2021 at 10:09:03AM +0200, Emiel Kollof wrote:
> Avon Robertson schreef op vr 22-10-2021 om 15:35 [+1300]:
> 
> 
> > My AMD machine was built by me in August 2018. OpenBSD -current was
> > the first OS installed and has been updated every 1-2 weeks since. It
> > caused me random grief from the time that amdgpu firmware was
> > introduced until I removed it the other day (, assuming my memory is
> > still A1-ok).
> 
> I saw you have a Ryzen 7 2700X as well. If you submit your stuff to
> bugs@, I'll add my dmesg as well. Let's get this fixed.
> 
> Regards,
> Emiel
> 

Greetings Emiel,

I will be sending a bug report ASAP. I want to test all of the available
firware versions as suggested by Chris first. Due to the random time
between freezes on my AMD machine, this will take a few days at least.

Re your other recent email; it is indeed surprising that with radeondrm
firmware instead of amdgpu firmware the card performs well. Indeed, it
is once again rock solid stable. The machine does record 4 drm errors
in the dmesg however. They will be included in my formal bug report.

Regards,
Avon

-- 
aer



Re: drmfreeze

2021-10-21 Thread Avon Robertson
On Thu, Oct 21, 2021 at 03:12:33PM -0700, Chris Cappuccio wrote:
> Emiel Kollof [em...@kollof.nl] wrote:
> > Chris Cappuccio schreef op do 21-10-2021 om 07:56 [-0700]:
> > 
> > > This appears to be a totally different failure than found by Avon.
> > > It's also on a different class of hardware.
> > 
> > Is it? The errors seem very similar. Also, Avon's dmesg suggests he's
> > using amdgpu as well. I wonder why he deletes all the amdgpu firmwares
> > only to copy over the radeon firmwares. For me, that would break
> > xenocara.
> > 
> 
> The errors are totally unrelated in my view. Other than having the
> same formatting, as they are both generated from the amdgpu framework,
> they appear to be completely different.
> 
> Anyways, I didn't even catch that I recommended the wrong firmware. 
> I'm trying to get some traction here so that someone who actually
> knows more about amdgpu might find a useful report. (A driver bug
> report with no dmesg is no report!)
> 
> Under radeondrm, some cards require firmware, most don't. With
> amdgpu, seems like at least this card doesn't.
> 
> Avon if you want to try older amdgpu firmwares,
> 
> 7.0: http://firmware.openbsd.org/firmware/7.0/amdgpu-firmware-20210818.tgz
> 6.9: http://firmware.openbsd.org/firmware/6.9/amdgpu-firmware-20201218.tgz
> 6.8: http://firmware.openbsd.org/firmware/6.8/amdgpu-firmware-20200619.tgz
> 6.7: http://firmware.openbsd.org/firmware/6.7/amdgpu-firmware-20190312.tgz
> 
> This is not important at this point, it's just another data point
> in your bug report to know which firmwares succeed and which fail.
> The right place for the report, including dmesg, is b...@openbsd.org.
> 
> Chris

Greetings Chris,

My AMD machine was built by me in August 2018. OpenBSD -current was the
first OS installed and has been updated every 1-2 weeks since. It caused
me random grief from the time that amdgpu firmware was introduced until
I removed it the other day (, assuming my memory is still A1-ok). Alas,
occasionally it was a number of days between freezes which may be a
'track it down' complication.

Considering my August 2018 build date, I will install the 6.7 firmware
first and work upwards through the amdgpu firmware, and stop testing
them as soon as one creates a freeze. I am sure they all did in the
past.

Over the next 2-3 days as time permits, I will collate any related
information that I can find and submit it to bugs@ as you have
suggested.

Thanks and regards,
Avon

-- 
aer



Re: drmfreeze

2021-10-21 Thread Avon Robertson
Hello Emiel,

Please read my inline and other comments.

On Thu, Oct 21, 2021 at 09:55:46AM +0200, Emiel Kollof wrote:
> Avon Robertson schreef op 2021-10-20 20:31:
> 
> >As suggested above by Chris?
> >1. Downloaded radeondrm-firmware-20181218.tgz to ~/download/.
> >2. # rm -fr /etc/firmware/amdgpu/*
> >3. # tar -C /etc -xzvf ~/download/radeondrm-firmware-20181218.tgz
> 
> This is not right. Why delete all the amdgpu firmwares? The radeondrm
> ones don't replace them.
>
 
Why is the above not right?

> (btw, downgrading didn't work for my Navi10 card, same freezes, same
> errors in dmesg, on 7.0 and on snap)
> 
> Regards,
> Emiel

-- 
aer

My AMD machine was rock solid stable prior to the change from radeon to
amdgpu firmware many months ago. As there have been no further freezes
since reverting back to radeon firmware I do not forsee that I will
reinstall amdgpu firmware, until I believe, the amdgpu bugs have
been fixed.

The latest email to you from Chris contains a more helpful reply to and
for you than I could.

Avon



Re: drmfreeze

2021-10-20 Thread Avon Robertson
On Tue, Oct 19, 2021 at 06:51:14PM +1300, Avon Robertson wrote:
> On Mon, Oct 18, 2021 at 03:52:34PM -0700, Chris Cappuccio wrote:
> > Here's a thread where people are seeing similar hangs on similar hardware 
> > under Linux:
> > 
> > https://bugs.freedesktop.org/show_bug.cgi?id=108900
> > 
> > "VERIFIED WONTFIX" because the kernel driver is probably closer to the 
> > issue, not Mesa
> > 
> > here's another: https://bugzilla.kernel.org/show_bug.cgi?id=201957
> > 
> > The "fix" in the second one seems to be downgrading to an earlier firmware.
> > 
> > Perhaps you can try to do that, the older radeondrm firmwares should be 
> > available from 
> > http://firmware.openbsd.org/firmware/6.8/radeondrm-firmware-20181218.tgz
> > 
> > You can just unpack various versions and place the appropriate files under 
> > /etc/firmware for quick testing...
> > 
> > I have no idea if the 20181218 is the right version to test, it may not be. 
> > It's just the first earlier version available on firmware.openbsd.org. You 
> > may need to grab a different version from a Linux repository.
> > 
> > Chris
> > 
> > Avon Robertson [avo...@xtra.co.nz] wrote:
> > > 
> > > drm:pid70131:gmc_v8_0_process_interrupt *ERROR* GPU fault detected: 147 
> > > 0x0582c802 for process  pid 0 thread Xorg pid 9818
> > > drm:pid70131:gmc_v8_0_process_interrupt *ERROR*   
> > > VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x86B0
> > > drm:pid70131:gmc_v8_0_process_interrupt *ERROR*   
> > > VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x070C8002
> > > drm:pid70131:gmc_v8_0_vm_decode_fault *ERROR* VM fault (0x02, vmid 3, 
> > > pasid 32769) at page 34480, write from 'TC2' (0x54433200) (200)
> > > [drm] *ERROR* Illegal register access in command stream
> > > [drm] *ERROR* ring gfx timeout, signaled seq=36730, emitted seq=36732
> > > [drm] *ERROR* Process information: process  pid 0 thread Xorg pid 9818
> > > amdgpu_device_suspend_display_audio: stub
> > > amdgpu: cp is busy, skip halt cp
> > > amdgpu: rlc is busy, skip halt rlc
> > > [drm] *ERROR* Failed to initialize parser -88!
> > >   
> > > [drm] *ERROR* Failed to initialize parser -88!
> > > [drm] *ERROR* ring gfx timeout, signaled seq=36733, emitted seq=36733
> > > [drm] *ERROR* Process information: process  pid 0 thread Xorg pid 9818
> > > amdgpu_device_suspend_display_audio: stub
> > > [drm] *ERROR* Failed to initialize parser -88!
> > >   
> > > [drm] *ERROR* Failed to initialize parser -88!
> > > usb_insert_transfer: xfer=0xfd901e91d000 not free
> > > ucomstart: err=INVAL
> > > ucom1: read start failed
> > > [drm] *ERROR* Failed to initialize parser -88!
> > >   
> > > [drm] *ERROR* Failed to initialize parser -88!
> > > 
> > 
> 
> -- 
> 
> Thank you for the above reference information URLs Chris. I will see if
> any fix they suggest fixes the freezes that I have been experiencing.
> 
> 
-- 
aer

Greetings Chris and misc@,

As suggested above by Chris?
1. Downloaded radeondrm-firmware-20181218.tgz to ~/download/.
2. # rm -fr /etc/firmware/amdgpu/*
3. # tar -C /etc -xzvf ~/download/radeondrm-firmware-20181218.tgz

The machine has been running without freezing for almost 24 hours,
so the radeon firmware may be a permanent fix. If the freezes occur
again I will send another email to this thread.

Is there some way to prevent the amdgpu firmware being reinstalled and
or updated, during the reboot following a snapshot upgrade procedure?



Re: drmfreeze

2021-10-18 Thread Avon Robertson
On Mon, Oct 18, 2021 at 03:52:34PM -0700, Chris Cappuccio wrote:
> Here's a thread where people are seeing similar hangs on similar hardware 
> under Linux:
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=108900
> 
> "VERIFIED WONTFIX" because the kernel driver is probably closer to the issue, 
> not Mesa
> 
> here's another: https://bugzilla.kernel.org/show_bug.cgi?id=201957
> 
> The "fix" in the second one seems to be downgrading to an earlier firmware.
> 
> Perhaps you can try to do that, the older radeondrm firmwares should be 
> available from 
> http://firmware.openbsd.org/firmware/6.8/radeondrm-firmware-20181218.tgz
> 
> You can just unpack various versions and place the appropriate files under 
> /etc/firmware for quick testing...
> 
> I have no idea if the 20181218 is the right version to test, it may not be. 
> It's just the first earlier version available on firmware.openbsd.org. You 
> may need to grab a different version from a Linux repository.
> 
> Chris
> 
> Avon Robertson [avo...@xtra.co.nz] wrote:
> > 
> > drm:pid70131:gmc_v8_0_process_interrupt *ERROR* GPU fault detected: 147 
> > 0x0582c802 for process  pid 0 thread Xorg pid 9818
> > drm:pid70131:gmc_v8_0_process_interrupt *ERROR*   
> > VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x86B0
> > drm:pid70131:gmc_v8_0_process_interrupt *ERROR*   
> > VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x070C8002
> > drm:pid70131:gmc_v8_0_vm_decode_fault *ERROR* VM fault (0x02, vmid 3, pasid 
> > 32769) at page 34480, write from 'TC2' (0x54433200) (200)
> > [drm] *ERROR* Illegal register access in command stream
> > [drm] *ERROR* ring gfx timeout, signaled seq=36730, emitted seq=36732
> > [drm] *ERROR* Process information: process  pid 0 thread Xorg pid 9818
> > amdgpu_device_suspend_display_audio: stub
> > amdgpu: cp is busy, skip halt cp
> > amdgpu: rlc is busy, skip halt rlc
> > [drm] *ERROR* Failed to initialize parser -88!
> > 
> > [drm] *ERROR* Failed to initialize parser -88!
> > [drm] *ERROR* ring gfx timeout, signaled seq=36733, emitted seq=36733
> > [drm] *ERROR* Process information: process  pid 0 thread Xorg pid 9818
> > amdgpu_device_suspend_display_audio: stub
> > [drm] *ERROR* Failed to initialize parser -88!
> > 
> > [drm] *ERROR* Failed to initialize parser -88!
> > usb_insert_transfer: xfer=0xfd901e91d000 not free
> > ucomstart: err=INVAL
> > ucom1: read start failed
> > [drm] *ERROR* Failed to initialize parser -88!
> > 
> > [drm] *ERROR* Failed to initialize parser -88!
> > 
> 

-- 

Thank you for the above reference information URLs Chris. I will see if
any fix they suggest fixes the freezes that I have been experiencing.




Re: drmfreeze

2021-10-16 Thread Avon Robertson
On Sat, Oct 16, 2021 at 08:27:03PM +1300, Avon Robertson wrote:
> Avon Robertson [avo...@xtra.co.nz] wrote:
> > Hello misc@,
> > 
> > Earlier today an AMD host I have froze again. I ssh'd into the host
> > and retrieved the output from dmesg, /var/log/messages, and
> > /var/run/dmesg.boot.
> > 
> > I found nothing of note in $HOME/.local/share/xorg/Xorg.0.log.
> > 
> > At the time of the freeze the ksh script I use to update my local /cvs
> > repository was the only programme executing inside the rightmost pane
> > of a 3 pane tmux session. I have a log of the output produced by this
> > script which is probably of no use to those who have been trying to
> > isolate and fix this bug for many months.
> > 
> > Please advise if any of the above is of use or interest to anyone, and
> > if so to which list should I post it.
> > 
> 
> Chris Cappuccio replied with:
> 
> posting the dmesg to this list would be a good start
> 
> Thank you for your reply Chris. I recommend that you read the below
> dmesg from the bottom to the top.
> 
> 
> OpenBSD 7.0 (GENERIC.MP) #212: Mon Sep 13 11:09:43 MDT 2021
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Hello Misc@,

In order to prevent this email becoming a "TLDR" document, please read
earlier posts in this thread to view the above #212 dmesg contents.

Below is yet another dmesg containing similar errors that are reported
near the bottom of the entire dmesg output.

As the latest freeze occurred yesterday, I have kept the machine
running. I can access the frozen machine via ssh so if any interest is
shown w.r.t. providing additional information regarding the freeze
within the next 12 hours, I will try to provide it.

OpenBSD 7.0-current (GENERIC.MP) #40: Fri Oct 15 09:29:25 MDT 2021
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 68647477248 (65467MB)
avail mem = 66550951936 (63467MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xe8980 (59 entries)
bios0: vendor American Megatrends Inc. version "F2" date 03/14/2018
bios0: Gigabyte Technology Co., Ltd. X470 AORUS ULTRA GAMING
acpi0 at bios0: ACPI 6.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SSDT CRAT CDIT SSDT MCFG HPET SSDT 
UEFI BGRT IVRS SSDT SSDT WSMT
acpi0: wakeup devices GPP0(S4) GPP1(S4) GPP3(S4) GPP4(S4) GPP5(S4) GPP6(S4) 
GPP7(S4) GPP8(S4) GPP9(S4) GPPA(S4) GPPB(S4) GPPC(S4) GPPD(S4) GPPE(S4) 
GPPF(S4) GP17(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Ryzen 7 2700X Eight-Core Processor, 3700.62 MHz, 17-08-02
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache
cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 100MHz
cpu0: mwait min=64, max=64, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: AMD Ryzen 7 2700X Eight-Core Processor, 3700.03 MHz, 17-08-02
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache
cpu1: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: AMD Ryzen 7 2700X Eight-Core Processor, 3700.02 MHz, 17-08-02
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,C

drmfreeze

2021-10-16 Thread Avon Robertson
Avon Robertson [avo...@xtra.co.nz] wrote:
> Hello misc@,
> 
> Earlier today an AMD host I have froze again. I ssh'd into the host
> and retrieved the output from dmesg, /var/log/messages, and
> /var/run/dmesg.boot.
> 
> I found nothing of note in $HOME/.local/share/xorg/Xorg.0.log.
> 
> At the time of the freeze the ksh script I use to update my local /cvs
> repository was the only programme executing inside the rightmost pane
> of a 3 pane tmux session. I have a log of the output produced by this
> script which is probably of no use to those who have been trying to
> isolate and fix this bug for many months.
> 
> Please advise if any of the above is of use or interest to anyone, and
> if so to which list should I post it.
> 

Chris Cappuccio replied with:

posting the dmesg to this list would be a good start

Thank you for your reply Chris. I recommend that you read the below
dmesg from the bottom to the top.


OpenBSD 7.0 (GENERIC.MP) #212: Mon Sep 13 11:09:43 MDT 2021
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 68647477248 (65467MB)
avail mem = 66550976512 (63467MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xe8980 (59 entries)
bios0: vendor American Megatrends Inc. version "F2" date 03/14/2018
bios0: Gigabyte Technology Co., Ltd. X470 AORUS ULTRA GAMING
acpi0 at bios0: ACPI 6.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SSDT CRAT CDIT SSDT MCFG HPET SSDT 
UEFI BGRT IVRS SSDT SSDT WSMT
acpi0: wakeup devices GPP0(S4) GPP1(S4) GPP3(S4) GPP4(S4) GPP5(S4) GPP6(S4) 
GPP7(S4) GPP8(S4) GPP9(S4) GPPA(S4) GPPB(S4) GPPC(S4) GPPD(S4) GPPE(S4) 
GPPF(S4) GP17(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Ryzen 7 2700X Eight-Core Processor, 3700.62 MHz, 17-08-02
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache
cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 100MHz
cpu0: mwait min=64, max=64, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: AMD Ryzen 7 2700X Eight-Core Processor, 3700.02 MHz, 17-08-02
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache
cpu1: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: AMD Ryzen 7 2700X Eight-Core Processor, 3700.02 MHz, 17-08-02
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu2: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache
cpu2: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu2: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: AMD Ryzen 7 2700X Eight-Core Processor, 3700.02 MHz, 17-08-02
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI

drmfreeze

2021-10-06 Thread Avon Robertson
Hello misc@,

Earlier today an AMD host I have froze again. I ssh'd into the host
and retrieved the output from dmesg, /var/log/messages, and
/var/run/dmesg.boot.

I found nothing of note in $HOME/.local/share/xorg/Xorg.0.log.

At the time of the freeze the ksh script I use to update my local /cvs
repository was the only programme executing inside the rightmost pane
of a 3 pane tmux session. I have a log of the output produced by this
script which is probably of no use to those who have been trying to
isolate and fix this bug for many months.

Please advise if any of the above is of use or interest to anyone, and
if so to which list should I post it.

aer
-- 



/usr/src/sys/dev/pci/pcidevs enigma

2021-08-02 Thread Avon Robertson
Hello misc@,
I seek help solving the below mystery. Thank you all in advance.

On two amd64 hosts that have the same change (add 0xc158 support) in
file /usr/src/sys/dev/pci/pcidevs, host z97st executes 'make' in
directory /usr/src/sys/dev/pci, whilst host gx470 reports an error
w.r.t. the same change (see below).

NB: The user prompt on both hosts is split over two lines.
The gx470 /var/run/dmesg.boot is appended last below.

gx470 host info

gx470://usr/src/sys/dev/pci
$ date
Tue Aug  3 13:24:08 NZST 2021
gx470://usr/src/sys/dev/pci
$ sysctl kern.version
kern.version=OpenBSD 6.9-current (GENERIC.MP) #159: Sun Aug  1 08:49:29
MDT 2021
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

gx470://usr/src/sys/dev/pci
$ ls -la /usr/src
total 120
drwxrwxr-x   17 root  wsrc 512 Aug  3 11:31 .
drwxr-xr-x   19 root  wheel512 Aug  2 03:22 ..
-rw-rw-r--1 aer   wsrc   7 May 26 07:06 .gitignore
drwxrwxr-x2 aer   wsrc 512 Aug  3 11:31 CVS
-rw-rw-r--1 aer   wsrc3639 Apr  6  2020 Makefile
-rw-rw-r--1 aer   wsrc   16103 May  3 12:04 Makefile.cross


gx470://usr/src/sys/dev/pci
$ grep -e PCIE952 pcidevs
product OXFORD2 OXPCIE952   0xc110  OXPCIE952 Parallel
product OXFORD2 OXPCIE952S  0xc120  OXPCIE952 Serial
product OXFORD2 OXPCIE952S_10xc158  OXPCIE952 Serial

gx470://usr/src/sys/dev/pci
$ touch pcidevs

gx470://usr/src/sys/dev/pci
$ make
/bin/rm -f pcidevs.h pcidevs_data.h
awk -f devlist2h.awk pcidevs
awk: can't open file devlist2h.awk
 source line number 1 source file devlist2h.awk
*** Error 2 in /usr/src/sys/dev/pci (Makefile:8 'pcidevs.h')

gx470://usr/src/sys/dev/pci
$ ls -lo Makefile devlist2h.awk pcidevs
-rw-rw-r--  1 aer  wsrc  -246 Oct 14  1996 Makefile
-rw-rw-r--  1 aer  wsrc  -   5934 Feb 22  2007 devlist2h.awk
-rw-rw-r--  1 aer  wsrc  - 385130 Aug  3 13:44 pcidevs

z97st host info

z97st://usr/src/sys/dev/pci
$ date
Tue Aug  3 13:37:33 NZST 2021
z97st://usr/src/sys/dev/pci
$ sysctl kern.version
kern.version=OpenBSD 6.9-current (GENERIC.MP) #159: Sun Aug  1 08:49:29 MDT 2021
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

z97st://usr/src/sys/dev/pci
$ ls -la /usr/src
total 120
drwxrwxr-x   17 root  wsrc 512 Jul 27 09:47 .
drwxr-xr-x   19 root  wheel512 Aug  2 03:22 ..
-rw-rw-r--1 aer   wsrc   7 May 26 07:06 .gitignore
drwxrwxr-x2 aer   wsrc 512 Jul 27 09:47 CVS
-rw-rw-r--1 aer   wsrc3639 Apr  6  2020 Makefile
-rw-rw-r--1 aer   wsrc   16103 May  3 12:04 Makefile.cross


z97st://usr/src/sys/dev/pci
$ grep -e PCIE952 pcidevs
product OXFORD2 OXPCIE952   0xc110  OXPCIE952 Parallel
product OXFORD2 OXPCIE952S  0xc120  OXPCIE952 Serial
product OXFORD2 OXPCIE952S_10xc158  OXPCIE952 Serial

z97st://usr/src/sys/dev/pci
$ make
/bin/rm -f pcidevs.h pcidevs_data.h
awk -f devlist2h.awk pcidevs

z97st://usr/src/sys/dev/pci
$ ls -lo Makefile devlist2h.awk pcidevs 
-rw-rw-r--  1 aer  wsrc  -246 Oct 14  1996 Makefile
-rw-rw-r--  1 aer  wsrc  -   5934 Feb 22  2007 devlist2h.awk
-rw-rw-r--  1 aer  wsrc  - 385130 Aug  3 13:43 pcidevs

Run diff against files.

rsync -av ... commands were used to get the above 3 files from each
host into the same directory to enable diff to be easily run against
them. Host z97st files then had their host ID appended to them.

gx470://home/aer/tmp
$ ls -l Make* devlist* pcidev*
-rw-rw-r--  1 aer  wsrc 246 Oct 14  1996 Makefile
-rw-rw-r--  1 aer  wsrc 246 Oct 14  1996 Makefile.z97st
-rw-rw-r--  1 aer  wsrc5934 Feb 22  2007 devlist2h.awk
-rw-rw-r--  1 aer  wsrc5934 Feb 22  2007 devlist2h.awk.z97st
-rw-rw-r--  1 aer  wsrc  385130 Aug  3 13:44 pcidevs
-rw-rw-r--  1 aer  wsrc  385130 Aug  3 13:43 pcidevs.z97st

gx470://home/aer/tmp
$ diff Makefile Makefile.z97st
gx470://home/aer/tmp
$ diff devlist2h.awk devlist2h.awk.z97st
gx470://home/aer/tmp
$ diff pcidevs pcidevs.z97st
gx470://home/aer/tmp
 

OpenBSD 6.9-current (GENERIC.MP) #159: Sun Aug  1 08:49:29 MDT 2021
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 68647477248 (65467MB)
avail mem = 66551033856 (63468MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xe8980 (59 entries)
bios0: vendor American Megatrends Inc. version "F2" date 03/14/2018
bios0: Gigabyte Technology Co., Ltd. X470 AORUS ULTRA GAMING
acpi0 at bios0: ACPI 6.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SSDT CRAT CDIT SSDT MCFG HPET SSDT 
UEFI BGRT IVRS SSDT SSDT WSMT
acpi0: wakeup devices GPP0(S4) GPP1(S4) GPP3(S4) GPP4(S4) GPP5(S4) GPP6(S4) 
GPP7(S4) GPP8(S4) GPP9(S4) GPPA(S4) GPPB(S4) GPPC(S4) GPPD(S4) GPPE(S4) 
GPPF(S4) GP17(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Ryzen 7 2700X Eight-Core Processor, 3700.62 MHz, 17-08-02
cpu0: 

Re: reposync:host key verification failed

2021-06-10 Thread Avon Robertson
On Thu, Jun 10, 2021 at 11:06:46AM -, Stuart Henderson wrote:
> On 2021-06-09, Avon Robertson  wrote:
> > On Tue, Jun 08, 2021 at 11:11:15AM +1200, Avon Robertson wrote:
> >> On Mon, Jun 07, 2021 at 08:21:24PM -, Stuart Henderson wrote:
> >> > On 2021-06-07, Avon Robertson  wrote:
> >> > > $ make obj
> >> > >===> ssh
> >> > > /usr/src/usr.bin/ssh/ssh/obj -> /usr/obj/usr.bin/ssh/ssh
> >> > > mkdir: /usr/obj/usr.bin: Permission denied
> >> > > *** Error 1 in ssh (:61 'obj': @cd 
> >> > > /usr/src/usr.bin/ssh/ssh;
> >> > > umask 007;  here=`/bin/pwd`; bsdsrcdir=`cd /usr/src; /bin/pwd`;  s...)
> >> > > *** Error 2 in /usr/src/usr.bin/ssh (:48 'obj': @for
> >> > > entry in ssh sshd ssh-add ssh-keygen ssh-agent scp sftp-server
> >> > > ssh-keys...)
> >> > >
> >> > > Mmmm. So looked first at permission in and below /usr/src. Found
> >> > > permissions to be 700 with owner and group being aer:wsrc. As root,
> >> > > # chmod -R 775 /usr/src
> >> > > and tried 'make obj' again. The same error as above was output.
> >> > 
> >> > The "permission denied" is on /usr/obj.
> >> > 
> >> > > I do not rule out the possibility that my local /cvs repository has
> >> > > been inadvertently corrupted by me.
> >> > 
> >> > unlikely.
> >> > 
> >> > > Theo, I am willing to install (not update) a later snapshot and try to
> >> > > build a test kernel for you tomorrow; if you belief it likely my /cvs
> >> > > repo is ok. If you think it likely that my repo is corrupt, I will
> >> > > remove it and reinstall a local repo from scratch before trying to
> >> > > build a test kernel for you.
> >> > 
> >> > I think at this point the best thing to do is simply update to a newer
> >> > snapshot and try reposync again. (Update is fine, no need to reinstall).
> >> > No need to build a kernel.
> >> > 
> >> > If there is still a failure then adjust permissions or group membership
> >> > so you can write to /usr/obj (there are various methods that will work),
> >> > and confirm that it works with a build of ssh fresh from cvs. But if I 
> >> > got
> >> > my testing right then I think this is now working.
> >> > 
> >> > 
> >> Many thanks Stuart.
> >> Will do as you have suggested.
> >> 
> >> Regards Avon
> >> -- 
> >> 
> >
> > Hello Stuart and misc@,
> > Installed new snaphot:
> > $ uname -prsv
> > OpenBSD 6.9 GENERIC.MP#58 amd64
> >
> > My script failed again with error:
> > reposync: host key verification failed - see
> > /var/db/reposync/known_hosts
> >
> > After executing
> > $ cd /usr/src/usr.bin/ssh
> > $ cvs up
> > $ make obj
> > $ make
> > $ doas make install
> > my script is working again without error.
> >
> > Thank you all for your help.
> >
> > Regards Avon
> >
> >
> 
> It should work OK with snapshots dated after 2021/06/08.
> 
> btw for future reference, the GENERIC.MP#58 isn't very useful for
> identification; it's better to use "sysctl kern.version".
> 
> 
Have noted "sysctl kern.version".

Thanks Stuart.

-- 
aer



Re: reposync:host key verification failed

2021-06-09 Thread Avon Robertson
On Tue, Jun 08, 2021 at 11:11:15AM +1200, Avon Robertson wrote:
> On Mon, Jun 07, 2021 at 08:21:24PM -, Stuart Henderson wrote:
> > On 2021-06-07, Avon Robertson  wrote:
> > > $ make obj
> > >===> ssh
> > > /usr/src/usr.bin/ssh/ssh/obj -> /usr/obj/usr.bin/ssh/ssh
> > > mkdir: /usr/obj/usr.bin: Permission denied
> > > *** Error 1 in ssh (:61 'obj': @cd /usr/src/usr.bin/ssh/ssh;
> > > umask 007;  here=`/bin/pwd`; bsdsrcdir=`cd /usr/src; /bin/pwd`;  s...)
> > > *** Error 2 in /usr/src/usr.bin/ssh (:48 'obj': @for
> > > entry in ssh sshd ssh-add ssh-keygen ssh-agent scp sftp-server
> > > ssh-keys...)
> > >
> > > Mmmm. So looked first at permission in and below /usr/src. Found
> > > permissions to be 700 with owner and group being aer:wsrc. As root,
> > > # chmod -R 775 /usr/src
> > > and tried 'make obj' again. The same error as above was output.
> > 
> > The "permission denied" is on /usr/obj.
> > 
> > > I do not rule out the possibility that my local /cvs repository has
> > > been inadvertently corrupted by me.
> > 
> > unlikely.
> > 
> > > Theo, I am willing to install (not update) a later snapshot and try to
> > > build a test kernel for you tomorrow; if you belief it likely my /cvs
> > > repo is ok. If you think it likely that my repo is corrupt, I will
> > > remove it and reinstall a local repo from scratch before trying to
> > > build a test kernel for you.
> > 
> > I think at this point the best thing to do is simply update to a newer
> > snapshot and try reposync again. (Update is fine, no need to reinstall).
> > No need to build a kernel.
> > 
> > If there is still a failure then adjust permissions or group membership
> > so you can write to /usr/obj (there are various methods that will work),
> > and confirm that it works with a build of ssh fresh from cvs. But if I got
> > my testing right then I think this is now working.
> > 
> > 
> Many thanks Stuart.
> Will do as you have suggested.
> 
> Regards Avon
> -- 
> 

Hello Stuart and misc@,
Installed new snaphot:
$ uname -prsv
OpenBSD 6.9 GENERIC.MP#58 amd64

My script failed again with error:
reposync: host key verification failed - see
/var/db/reposync/known_hosts

After executing
$ cd /usr/src/usr.bin/ssh
$ cvs up
$ make obj
$ make
$ doas make install
my script is working again without error.

Thank you all for your help.

Regards Avon



Re: reposync:host key verification failed

2021-06-07 Thread Avon Robertson
On Mon, Jun 07, 2021 at 08:21:24PM -, Stuart Henderson wrote:
> On 2021-06-07, Avon Robertson  wrote:
> > $ make obj
> >===> ssh
> > /usr/src/usr.bin/ssh/ssh/obj -> /usr/obj/usr.bin/ssh/ssh
> > mkdir: /usr/obj/usr.bin: Permission denied
> > *** Error 1 in ssh (:61 'obj': @cd /usr/src/usr.bin/ssh/ssh;
> > umask 007;  here=`/bin/pwd`; bsdsrcdir=`cd /usr/src; /bin/pwd`;  s...)
> > *** Error 2 in /usr/src/usr.bin/ssh (:48 'obj': @for
> > entry in ssh sshd ssh-add ssh-keygen ssh-agent scp sftp-server
> > ssh-keys...)
> >
> > Mmmm. So looked first at permission in and below /usr/src. Found
> > permissions to be 700 with owner and group being aer:wsrc. As root,
> > # chmod -R 775 /usr/src
> > and tried 'make obj' again. The same error as above was output.
> 
> The "permission denied" is on /usr/obj.
> 
> > I do not rule out the possibility that my local /cvs repository has
> > been inadvertently corrupted by me.
> 
> unlikely.
> 
> > Theo, I am willing to install (not update) a later snapshot and try to
> > build a test kernel for you tomorrow; if you belief it likely my /cvs
> > repo is ok. If you think it likely that my repo is corrupt, I will
> > remove it and reinstall a local repo from scratch before trying to
> > build a test kernel for you.
> 
> I think at this point the best thing to do is simply update to a newer
> snapshot and try reposync again. (Update is fine, no need to reinstall).
> No need to build a kernel.
> 
> If there is still a failure then adjust permissions or group membership
> so you can write to /usr/obj (there are various methods that will work),
> and confirm that it works with a build of ssh fresh from cvs. But if I got
> my testing right then I think this is now working.
> 
> 
Many thanks Stuart.
Will do as you have suggested.

Regards Avon
-- 



Re: reposync:host key verification failed

2021-06-07 Thread Avon Robertson
Hello again Theo, Stuart, and naddy,

Please view my findings at the end of this post.

On Mon, Jun 07, 2021 at 12:16:19PM +1200, Avon Robertson wrote:
> Hello Theo, Stuart, and naddy,
> 
> Thank you for your responses. I will do as you have suggested and
> post my findings to misc@ upon completion.
> 
> Regard Avon.
> 
> On Sun, Jun 06, 2021 at 04:38:55PM -0600, Theo de Raadt wrote:
> > Yes a diff we need tested. Snapshots often contain future diffs, being
> > tested, and once in a while those diffs contain errors.
> > 
> > Newer snapshots contain a fix to this diff, another approach is to try a
> > newer snapshot.
> > 
> > 
> > Stuart Henderson  wrote:
> > 
> > > There are some diffs in ssh in snapshots, please try building ssh from
> > > source rather than snapshot and see if it fixes things,
> > > 
> > > $ cd /usr/src/usr.bin/ssh
> > > $ cvs up
> > > $ make obj
> > > $ make
> > > $ doas make install
> > > 
> > > 
> > > On 2021-06-06, Avon Robertson  wrote:
> > > > Hello misc@,
> > > > I have used a shell script containing the following statements since the
> > > > 20th January 2021. It has executed without error until recently. The
> > > > last error free execution was on the 30th May.
> > > >
> > > > #!/bin/ksh
> > > > logfile="$HOME/var/log/updcvs"
> > > > printf "\n$(date)\n" >> $logfile
> > > > printf "Call reposync to update local /cvs repository\nOutput is logged 
> > > > to $logfile\n"
> > > > doas -u cvs /usr/local/bin/reposync rsync://anoncvs.au.openbsd.org/cvs 
> > > > /cvs 2>&1 | /usr/bin/tee -a $logfile
> > > > exit $?
> > > >
> > > > Using a previous snapshot, reposync began to report failures as shown in
> > > > my log, on:
> > > > Mon May 31 20:07:02 NZST 2021
> > > > reposync: host key verification failed - see
> > > > /var/db/reposync/known_hosts
> > > >
> > > > The same error was then recorded in my log on the 3rd, 4th, 5th, and
> > > > 6th of June. The above known_hosts file does not exist on this machine.
> > > > The FILES section of reposync(1) I have interpreted as meaning that the
> > > > above known_hosts file, is not needed when the official keys exist in
> > > > file /usr/local/share/reposync/ssh_known_hosts which they do on this
> > > > machine.
> > > >
> > > > Hints as to where the problem is would be very appreciated. I have
> > > > included a dmesg output on the off chance it will contain useful
> > > > information.
> > > >
> > > > Regards Avon.
> > > >
> > > > OpenBSD 6.9-current (GENERIC.MP) #54: Sat Jun  5 09:41:12 MDT 2021
> > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > > real mem = 68647477248 (65467MB)
> > > > avail mem = 66551521280 (63468MB)
> > > > random: good seed from bootblocks
> > > > mpath0 at root
> > > > scsibus0 at mpath0: 256 targets
> > > > mainbus0 at root
> > > > bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xe8980 (59 entries)
> > > > bios0: vendor American Megatrends Inc. version "F2" date 03/14/2018
> > > > bios0: Gigabyte Technology Co., Ltd. X470 AORUS ULTRA GAMING
> > > > acpi0 at bios0: ACPI 6.0
> > > > acpi0: sleep states S0 S3 S4 S5
> > > > acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SSDT CRAT CDIT SSDT MCFG 
> > > > HPET SSDT UEFI BGRT IVRS SSDT SSDT WSMT
> > > > acpi0: wakeup devices GPP0(S4) GPP1(S4) GPP3(S4) GPP4(S4) GPP5(S4) 
> > > > GPP6(S4) GPP7(S4) GPP8(S4) GPP9(S4) GPPA(S4) GPPB(S4) GPPC(S4) GPPD(S4) 
> > > > GPPE(S4) GPPF(S4) GP17(S4) [...]
> > > > acpitimer0 at acpi0: 3579545 Hz, 32 bits
> > > > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> > > > cpu0 at mainbus0: apid 0 (boot processor)
> > > > cpu0: AMD Ryzen 7 2700X Eight-Core Processor, 3700.63 MHz, 17-08-02
> > > > cpu0: 
> > > > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAV

Re: reposync:host key verification failed

2021-06-06 Thread Avon Robertson
Hello Theo, Stuart, and naddy,

Thank you for your responses. I will do as you have suggested and
post my findings to misc@ upon completion.

Regard Avon.

On Sun, Jun 06, 2021 at 04:38:55PM -0600, Theo de Raadt wrote:
> Yes a diff we need tested. Snapshots often contain future diffs, being
> tested, and once in a while those diffs contain errors.
> 
> Newer snapshots contain a fix to this diff, another approach is to try a
> newer snapshot.
> 
> 
> Stuart Henderson  wrote:
> 
> > There are some diffs in ssh in snapshots, please try building ssh from
> > source rather than snapshot and see if it fixes things,
> > 
> > $ cd /usr/src/usr.bin/ssh
> > $ cvs up
> > $ make obj
> > $ make
> > $ doas make install
> > 
> > 
> > On 2021-06-06, Avon Robertson  wrote:
> > > Hello misc@,
> > > I have used a shell script containing the following statements since the
> > > 20th January 2021. It has executed without error until recently. The
> > > last error free execution was on the 30th May.
> > >
> > > #!/bin/ksh
> > > logfile="$HOME/var/log/updcvs"
> > > printf "\n$(date)\n" >> $logfile
> > > printf "Call reposync to update local /cvs repository\nOutput is logged 
> > > to $logfile\n"
> > > doas -u cvs /usr/local/bin/reposync rsync://anoncvs.au.openbsd.org/cvs 
> > > /cvs 2>&1 | /usr/bin/tee -a $logfile
> > > exit $?
> > >
> > > Using a previous snapshot, reposync began to report failures as shown in
> > > my log, on:
> > > Mon May 31 20:07:02 NZST 2021
> > > reposync: host key verification failed - see
> > > /var/db/reposync/known_hosts
> > >
> > > The same error was then recorded in my log on the 3rd, 4th, 5th, and
> > > 6th of June. The above known_hosts file does not exist on this machine.
> > > The FILES section of reposync(1) I have interpreted as meaning that the
> > > above known_hosts file, is not needed when the official keys exist in
> > > file /usr/local/share/reposync/ssh_known_hosts which they do on this
> > > machine.
> > >
> > > Hints as to where the problem is would be very appreciated. I have
> > > included a dmesg output on the off chance it will contain useful
> > > information.
> > >
> > > Regards Avon.
> > >
> > > OpenBSD 6.9-current (GENERIC.MP) #54: Sat Jun  5 09:41:12 MDT 2021
> > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > real mem = 68647477248 (65467MB)
> > > avail mem = 66551521280 (63468MB)
> > > random: good seed from bootblocks
> > > mpath0 at root
> > > scsibus0 at mpath0: 256 targets
> > > mainbus0 at root
> > > bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xe8980 (59 entries)
> > > bios0: vendor American Megatrends Inc. version "F2" date 03/14/2018
> > > bios0: Gigabyte Technology Co., Ltd. X470 AORUS ULTRA GAMING
> > > acpi0 at bios0: ACPI 6.0
> > > acpi0: sleep states S0 S3 S4 S5
> > > acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SSDT CRAT CDIT SSDT MCFG HPET 
> > > SSDT UEFI BGRT IVRS SSDT SSDT WSMT
> > > acpi0: wakeup devices GPP0(S4) GPP1(S4) GPP3(S4) GPP4(S4) GPP5(S4) 
> > > GPP6(S4) GPP7(S4) GPP8(S4) GPP9(S4) GPPA(S4) GPPB(S4) GPPC(S4) GPPD(S4) 
> > > GPPE(S4) GPPF(S4) GP17(S4) [...]
> > > acpitimer0 at acpi0: 3579545 Hz, 32 bits
> > > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> > > cpu0 at mainbus0: apid 0 (boot processor)
> > > cpu0: AMD Ryzen 7 2700X Eight-Core Processor, 3700.63 MHz, 17-08-02
> > > cpu0: 
> > > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> > > cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 
> > > 64b/line 8-way L2 cache
> > > cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully 
> > > associative
> > > cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully 
> > > associative
> > > cpu0: smt 0, core 0, package 0
> > > mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> > > cpu0: apic clock running at 100MHz
> > > cpu0: mwait min=64, max=64, IBE
&g

reposync:host key verification failed

2021-06-06 Thread Avon Robertson
Hello misc@,
I have used a shell script containing the following statements since the
20th January 2021. It has executed without error until recently. The
last error free execution was on the 30th May.

#!/bin/ksh
logfile="$HOME/var/log/updcvs"
printf "\n$(date)\n" >> $logfile
printf "Call reposync to update local /cvs repository\nOutput is logged to 
$logfile\n"
doas -u cvs /usr/local/bin/reposync rsync://anoncvs.au.openbsd.org/cvs /cvs 
2>&1 | /usr/bin/tee -a $logfile
exit $?

Using a previous snapshot, reposync began to report failures as shown in
my log, on:
Mon May 31 20:07:02 NZST 2021
reposync: host key verification failed - see
/var/db/reposync/known_hosts

The same error was then recorded in my log on the 3rd, 4th, 5th, and
6th of June. The above known_hosts file does not exist on this machine.
The FILES section of reposync(1) I have interpreted as meaning that the
above known_hosts file, is not needed when the official keys exist in
file /usr/local/share/reposync/ssh_known_hosts which they do on this
machine.

Hints as to where the problem is would be very appreciated. I have
included a dmesg output on the off chance it will contain useful
information.

Regards Avon.

OpenBSD 6.9-current (GENERIC.MP) #54: Sat Jun  5 09:41:12 MDT 2021
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 68647477248 (65467MB)
avail mem = 66551521280 (63468MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xe8980 (59 entries)
bios0: vendor American Megatrends Inc. version "F2" date 03/14/2018
bios0: Gigabyte Technology Co., Ltd. X470 AORUS ULTRA GAMING
acpi0 at bios0: ACPI 6.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SSDT CRAT CDIT SSDT MCFG HPET SSDT 
UEFI BGRT IVRS SSDT SSDT WSMT
acpi0: wakeup devices GPP0(S4) GPP1(S4) GPP3(S4) GPP4(S4) GPP5(S4) GPP6(S4) 
GPP7(S4) GPP8(S4) GPP9(S4) GPPA(S4) GPPB(S4) GPPC(S4) GPPD(S4) GPPE(S4) 
GPPF(S4) GP17(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Ryzen 7 2700X Eight-Core Processor, 3700.63 MHz, 17-08-02
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache
cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 100MHz
cpu0: mwait min=64, max=64, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: AMD Ryzen 7 2700X Eight-Core Processor, 3700.01 MHz, 17-08-02
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache
cpu1: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: AMD Ryzen 7 2700X Eight-Core Processor, 3700.02 MHz, 17-08-02
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu2: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache
cpu2: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu2: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: AMD Ryzen 7 2700X Eight-Core Processor, 3700.02 MHz, 17-08-02
cpu3: 

Re: Catastrophic

2020-03-08 Thread Avon Robertson
On Sat, Feb 29, 2020 at 07:41:59AM -0800, Justin Noor wrote:
> Awesome - thank you for your time and for the valuable information.
> 
> That’s hilarious about the serial port. I’ll try plugging into a switch,
> reproducing the crash, and SSHing into it. I still haven’t tried the
> syslogd tip you mentioned either. It’s time for me to start learning more
> about X. Will be in touch.
> 
> Regards
> 
> On Fri, Feb 28, 2020 at 6:57 AM Stuart Longland 
> wrote:
> 
> > On 28/2/20 11:32 pm, Justin Noor wrote:
> > > Thanks for offering to help and sorry for the delay - I got dragged into
> > a
> > > work emergency. I finally managed to SCP my dmesg to a remote machine.
> >
> > Heh, no problems, these things happen.
> >
> > > As a refresher I have a 6.6 current machine that crashes when X is
> > running,
> > > and almost instantly when Firefox is running - it runs fine without X.
> > The
> > > machine becomes totally frozen - I have to perform a forced shutdown to
> > > exit this state. The issue appears to be graphics related and is
> > > inconsistent - sometimes it crashes immediately, other times it does not.
> >
> > Sometimes it might be the way a particular graphics toolkit "tickles"
> > the video hardware too.  For instance FVWM uses libxcb for drawing
> > graphics which means you're likely to be just working with 2D primitives.
> >
> > Then Firefox with its GTK+ back-end fires off a few RENDER extension
> > requests to the X server and whoopsie!  Down she goes!
> >
> > > There are indeed some "unknown product" messages related to my PCI
> > graphics
> > > card in my dmesg, but I haven't been able to decipher them yet. Those
> > > usually mean the device is not supported, but it is, and I'm sure I have
> > > the correct driver (amdgpu0). Previously I had no issues for months,
> > which
> > > is why I suspected hardware failure. Admittedly I've been lucky with
> > > graphics cards over the years, and don't know much about PCI.
> >
> > No issues for months running a previous version of OpenBSD or the same
> > you're running now?
> >
> > One suggestion I made too was to maybe try setting up a serial console
> > link… turns out the motherboard makers know how to tease:
> >
> > > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> > > com0: probed fifo depth: 0 bytes
> >
> > That says there is a RS-232 port somewhere… so I had a look at the
> > handbook:
> >
> > https://dlcdnets.asus.com/pub/ASUS/mb/SocketAM4/ROG_STRIX_B450-I_GAMING/E14337_ROG_STRIX_B450-I_GAMING_UM_PRINT.pdf
> >
> > They didn't wire it up to a pin header, which is annoying.
> >
> > On the video front, I did see this:
> > > initializing kernel modesetting (POLARIS11 0x1002:0x67EF 0x1002:0x0B04
> > > 0xE5).
> > > amdgpu_irq_add_domain: stub
> > > amdgpu_device_resize_fb_bar: stub
> > > amdgpu: [powerplay] Failed to retrieve minimum clocks.
> > > amdgpu0: 1360x768, 32bpp
> > > wsdisplay0 at amdgpu0 mux 1: console (std, vt100 emulation), using wskbd0
> > > wskbd1: connecting to wsdisplay0
> > > wsdisplay0: screen 1-5 added (std, vt100 emulation)
> >
> > The "stub" messages make me wonder if we're hitting some
> > not-yet-implemented features.  That "failed to retrieve minimum clocks"
> > has been seen on Linux as well, and there it was related to PCI prefetch
> > register programming.
> >
> > The machine you've got isn't much different to what I have at work
> > actually: Rysen 7 1700 (so previous generation), and a RX550 video card
> > (POLARIS12, maybe slightly newer?)… the machine is fitted with a RS-232
> > serial port so I might try a little experiment with a USB stick and see
> > if I can install OpenBSD 6.6 to USB storage and try to reproduce the crash.
> > --
> > Stuart Longland (aka Redhatter, VK4MSL)
> >
> > I haven't lost my mind...
> >   ...it's backed up on a tape somewhere.
> >

Hello Justin and Stuart,

It is possible that the errors that I have found in /var/log/messages*
are unrelated to the above. Thoughts?

I have noticed that the freezes on this machine occur more quickly if I
am working within tmux(1), as I was; at the time that the last freeze
occurred. That may have been sheer coincidence.

$ grep ERROR /var/log/messag*
/var/log/messages:Mar  8 16:20:10 gx470 /bsd: [drm] *ERROR* ring gfx timeout, 
signaled seq=385, emitted seq=387
/var/log/messages:Mar  9 07:06:34 gx470 /bsd: [drm] *ERROR* Illegal register 
access in command stream
/var/log/messages:Mar  9 07:06:44 gx470 /bsd: [drm] *ERROR* ring gfx timeout, 
signaled seq=794, emitted seq=796

My machine's last freeze occurred at the time of the last error in
/var/log/messages. I am able to remotely login to this machine and
access files when it is frozen, using kermit(1) and a USB to Serial
adapter. The machine's /var/run/dmesg.boot can be found in my first
email to this thread.

Regards Avon

-- 
aer



Re: Catastrophic

2020-02-28 Thread Avon Robertson
On Sat, Feb 29, 2020 at 12:57:07AM +1000, Stuart Longland wrote:
> On 28/2/20 11:32 pm, Justin Noor wrote:
> > Thanks for offering to help and sorry for the delay - I got dragged into a
> > work emergency. I finally managed to SCP my dmesg to a remote machine.
> 
> Heh, no problems, these things happen.
> 
> > As a refresher I have a 6.6 current machine that crashes when X is running,
> > and almost instantly when Firefox is running - it runs fine without X. The
> > machine becomes totally frozen - I have to perform a forced shutdown to
> > exit this state. The issue appears to be graphics related and is
> > inconsistent - sometimes it crashes immediately, other times it does not.
> 
> Sometimes it might be the way a particular graphics toolkit "tickles"
> the video hardware too.  For instance FVWM uses libxcb for drawing
> graphics which means you're likely to be just working with 2D primitives.
> 
> Then Firefox with its GTK+ back-end fires off a few RENDER extension
> requests to the X server and whoopsie!  Down she goes!
> 
> > There are indeed some "unknown product" messages related to my PCI graphics
> > card in my dmesg, but I haven't been able to decipher them yet. Those
> > usually mean the device is not supported, but it is, and I'm sure I have
> > the correct driver (amdgpu0). Previously I had no issues for months, which
> > is why I suspected hardware failure. Admittedly I've been lucky with
> > graphics cards over the years, and don't know much about PCI.
> 
> No issues for months running a previous version of OpenBSD or the same
> you're running now?
> 
> One suggestion I made too was to maybe try setting up a serial console
> link… turns out the motherboard makers know how to tease:
> 
> > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> > com0: probed fifo depth: 0 bytes
> 
> That says there is a RS-232 port somewhere… so I had a look at the handbook:
> https://dlcdnets.asus.com/pub/ASUS/mb/SocketAM4/ROG_STRIX_B450-I_GAMING/E14337_ROG_STRIX_B450-I_GAMING_UM_PRINT.pdf
> 
> They didn't wire it up to a pin header, which is annoying.
> 
> On the video front, I did see this:
> > initializing kernel modesetting (POLARIS11 0x1002:0x67EF 0x1002:0x0B04
> > 0xE5).
> > amdgpu_irq_add_domain: stub
> > amdgpu_device_resize_fb_bar: stub
> > amdgpu: [powerplay] Failed to retrieve minimum clocks.
> > amdgpu0: 1360x768, 32bpp
> > wsdisplay0 at amdgpu0 mux 1: console (std, vt100 emulation), using wskbd0
> > wskbd1: connecting to wsdisplay0
> > wsdisplay0: screen 1-5 added (std, vt100 emulation)
> 
> The "stub" messages make me wonder if we're hitting some
> not-yet-implemented features.  That "failed to retrieve minimum clocks"
> has been seen on Linux as well, and there it was related to PCI prefetch
> register programming.
> 
> The machine you've got isn't much different to what I have at work
> actually: Rysen 7 1700 (so previous generation), and a RX550 video card
> (POLARIS12, maybe slightly newer?)… the machine is fitted with a RS-232
> serial port so I might try a little experiment with a USB stick and see
> if I can install OpenBSD 6.6 to USB storage and try to reproduce the crash.
> -- 
> Stuart Longland (aka Redhatter, VK4MSL)
> 
> I haven't lost my mind...
>   ...it's backed up on a tape somewhere.
> 

Hello Justin and Stuart,

I hope the following may be of help in solving the cause of the crash.

I have experienced a similar type of crash when using X on this machine
for approximately the last 6 weeks. Prior to this, X had been running on
this machine without apparent problems for 12 plus months.

The only browser installed on this machine is lynx(1). My crashes have
been random with no recognised culprit at the time of the crash, which
usually occurred within 10 minutes of invoking startx(1).

fvwm(1) is the only window manager installed on this machine. All my
crashes have required the machine to be powered off to regain control.

This machine's graphics card was identified by it's vendor as a:
  Sapphire Nitro+ RX580 8G GDDR5 Graphics Card 2X HDMI + 2X Display+DVI
  Port.
This machine is connected to it's monitor using a Display Port cable.

This machine has worked and is working without problems from a console,
with and without tmux(1). If multiple consoles are run at the same time
however, when exit(3) is invoked from one of them the time taken to
exit is sometimes longer than 10 seconds. This seems odd to me.

Please find below the contents of this machine's /var/run/dmesg.boot.

OpenBSD 6.6-current (GENERIC.MP) #0: Sun Feb 23 00:07:16 MST 2020
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 68644982784 (65464MB)
avail mem = 66551980032 (63468MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xe8980 (59 entries)
bios0: vendor American Megatrends Inc. version "F1" date 03/01/2018
bios0: Gigabyte Technology Co., Ltd. X470 AORUS ULTRA GAMING
acpi0 at bios0: ACPI 6.0
acpi0: sleep states S0 S3