Re: AMD gpu RX 6600 not recognized

2023-09-21 Thread Jonathan Gray
On Thu, Sep 21, 2023 at 11:40:24AM -0400, Solène Rapenne wrote:
> On Thu, 2023-09-21 at 17:50 +1000, Jonathan Gray wrote:
> > On Thu, Sep 21, 2023 at 09:05:50AM +0200, Solène Rapenne wrote:
> > > > Synopsis:   my GPU AMD Sapphire RX 6600 isn't recognized
> > > > Category:   kernel
> > > > Environment:
> > > System  : OpenBSD 7.4
> > > Details : OpenBSD 7.4-beta (GENERIC.MP) #1372: Wed Sep 20 
> > > 09:43:54 MDT 2023
> > >  
> > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > 
> > > Architecture: OpenBSD.amd64
> > > Machine : amd64
> > > > Description:
> > > I can't get accelerated graphics with a Sapphire RX 6600.
> > >     The amdgpu firmware is correctly installed after running fw_update
> > 
> > > [drm] *ERROR* visible_vram_size 1ff00 or aper_base_kaddr 0x0 is not 
> > > initialized.
> > > [drm] *ERROR* Failed to process memory training!
> > > [drm] *ERROR* sw_init of IP block  failed -22
> > > drm:pid0:amdgpu_device_init *ERROR* amdgpu_device_ip_init failed
> > > drm:pid0:amdgpu_attachhook *ERROR* Fatal error during GPU init
> > > efifb0 at mainbus0: 1920x1080, 32bpp
> > 
> > Does the bios have an option to disable resizable pci bar?
> > 
> > Doing an install with csm enabled (vga instead of efifb)
> > may also change things.
> 
> disabline the bar thing in the bios allowed me to have the GPU recognized, 
> but for some reason GDM started but didn't display anything. Enabling auto 
> login into gnome to circumvent the issue with GDM, I ended in a frozen gnome 
> with only gnome-shell loaded.
> 
> with xenodm and cwm it worked
> 
> until I've lost the screen, ssh was still working and I got access to dmesg 
> which has some stuff related to GPU

does this diff change what happens?

Index: sys/dev/pci/drm/amd/amdgpu/amdgpu_gmc.c
===
RCS file: /cvs/src/sys/dev/pci/drm/amd/amdgpu/amdgpu_gmc.c,v
retrieving revision 1.10
diff -u -p -r1.10 amdgpu_gmc.c
--- sys/dev/pci/drm/amd/amdgpu/amdgpu_gmc.c 19 Jun 2023 00:38:02 -  
1.10
+++ sys/dev/pci/drm/amd/amdgpu/amdgpu_gmc.c 22 Sep 2023 03:37:10 -
@@ -670,15 +670,6 @@ void amdgpu_gmc_get_vbios_allocations(st
} else {
size = amdgpu_gmc_get_vbios_fb_size(adev);
 
-#ifdef __amd64__
-   /*
-* XXX Workaround for machines where the framebuffer
-* size reported by the hardware is incorrect.
-*/
-   extern psize_t efifb_stolen();
-   size = max(size, efifb_stolen());
-#endif
-
if (adev->mman.keep_stolen_vga_memory)
size = max(size, (unsigned)AMDGPU_VBIOS_VGA_ALLOCATION);
}



Re: AMD gpu RX 6600 not recognized

2023-09-21 Thread Solène Rapenne
On Thu, 2023-09-21 at 17:50 +1000, Jonathan Gray wrote:
> On Thu, Sep 21, 2023 at 09:05:50AM +0200, Solène Rapenne wrote:
> > > Synopsis:   my GPU AMD Sapphire RX 6600 isn't recognized
> > > Category:   kernel
> > > Environment:
> > System  : OpenBSD 7.4
> > Details : OpenBSD 7.4-beta (GENERIC.MP) #1372: Wed Sep 20 
> > 09:43:54 MDT 2023
> >  
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > > Description:
> > I can't get accelerated graphics with a Sapphire RX 6600.
> >     The amdgpu firmware is correctly installed after running fw_update
> 
> > [drm] *ERROR* visible_vram_size 1ff00 or aper_base_kaddr 0x0 is not 
> > initialized.
> > [drm] *ERROR* Failed to process memory training!
> > [drm] *ERROR* sw_init of IP block  failed -22
> > drm:pid0:amdgpu_device_init *ERROR* amdgpu_device_ip_init failed
> > drm:pid0:amdgpu_attachhook *ERROR* Fatal error during GPU init
> > efifb0 at mainbus0: 1920x1080, 32bpp
> 
> Does the bios have an option to disable resizable pci bar?
> 
> Doing an install with csm enabled (vga instead of efifb)
> may also change things.

disabline the bar thing in the bios allowed me to have the GPU recognized, but 
for some reason GDM started but didn't display anything. Enabling auto login 
into gnome to circumvent the issue with GDM, I ended in a frozen gnome with 
only gnome-shell loaded.

with xenodm and cwm it worked

until I've lost the screen, ssh was still working and I got access to dmesg 
which has some stuff related to GPU

OpenBSD 7.4-beta (GENERIC.MP) #1372: Wed Sep 20 09:43:54 MDT 2023
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 34227875840 (32642MB)
avail mem = 33170821120 (31634MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xcda03000 (64 entries)
bios0: vendor American Megatrends International, LLC. version "A.70" date 
06/23/2021
bios0: Micro-Star International Co., Ltd. MS-7C56
efi0 at bios0: UEFI 2.7
efi0: American Megatrends rev 0x50011
acpi0 at bios0: ACPI 6.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT SSDT SSDT FIDT MCFG HPET IVRS VFCT PCCT SSDT CRAT 
CDIT SSDT SSDT SSDT SSDT WSMT APIC SSDT SSDT SSDT FPDT
acpi0: wakeup devices GP12(S4) GP13(S4) XHC0(S4) GP30(S4) GP31(S4) GPP0(S4) 
GPP8(S4) SWUS(S4) SWDS(S4) PTXH(S4) PT20(S4) PT24(S4) PT26(S4) PT27(S4) 
PT28(S4) PT29(S4)
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimcfg0 at acpi0
acpimcfg0: addr 0xf000, bus 0-127
acpihpet0 at acpi0: 14318180 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Ryzen 5 5600X 6-Core Processor, 3700.00 MHz, 19-21-02, patch 0a201204
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,HWPSTATE,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,PKU,IBPB,IBRS,STIBP,STIBP_ALL,IBRS_PREF,IBRS_SM,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 512KB 64b/line 
8-way L2 cache, 32MB 64b/line 16-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 100MHz
cpu0: mwait min=64, max=64, C-substates=1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: AMD Ryzen 5 5600X 6-Core Processor, 3700.00 MHz, 19-21-02, patch 0a201204
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,HWPSTATE,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,PKU,IBPB,IBRS,STIBP,STIBP_ALL,IBRS_PREF,IBRS_SM,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 512KB 64b/line 
8-way L2 cache, 32MB 64b/line 16-way L3 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: AMD Ryzen 5 5600X 6-Core Processor, 3700.00 MHz, 19-21-02, patch 0a201204
cpu2: 

Re: FS bit on sstatus csr set on riscv64

2023-09-21 Thread Mark Kettenis
> Date: Thu, 21 Sep 2023 10:23:45 +0200
> From: "Peter J. Philipp" 
> 
> Hi,
> 
> I don't know if it's the same on Sifive based CPU's but on the D1
> (doesn't boot beyond main() yet) the FS bits are set.  These are floating
> point indicators, and I thought these should be off?  In my debugs I have
> found this:
> 
> 10100111 p
> 80026100
> 
> that is the respective binary and hex register that the CSR gave on my D1.
> I have turned this off in locore.S by unsetting the bits in CSR.  it's
> just 2 instructions more.
> 
> Please have a look in page 39 of this RISCV-privileged (2021) document:
> https://mainrechner.de/riscv-privileged-20211203.pdf
> 
> It is the same bit offset in mstatus and sstatus.
> 
> On the D1 after the CPU is reset the FP bits go back to 0, meaning that on
> its depressive boot-life the FS bits have been turned on.
> 
> to check this I would add a debugging printf high in pmap_bootstrap() that
> looks like so:
> 
>status = csr_read(sstatus);
>printf("sstatus: %lX\n", status);
> 
> Principally I can do this too but it would take me some time changing source
> trees and recompiling.
> 
> To turn floating point off, I have set this in locore.S:
> 
> /* turn off any possible FP bits set */
> li  t0, SSTATUS_FS_MASK
> csrcsstatus, t0
> 
> under the pagetable END.
> 
> Best Regards,
> -peter
> 
> PS: If you would like me to keep D1 stuff to myself without relaying findings 
>   back to you let me know.  I know we don't use floating point code
>   in the kernel whatsoever.  Am I wrong?

Right.  This probably fixes itself later, but it is probably best to
clear this early on.  We do clear the FS bits for the secondary CPUs
in cpu_start_secondary().

Need to think what the best place to this would be.  But somewhere in
initriscv() is probably good enough.



FS bit on sstatus csr set on riscv64

2023-09-21 Thread Peter J. Philipp
Hi,

I don't know if it's the same on Sifive based CPU's but on the D1
(doesn't boot beyond main() yet) the FS bits are set.  These are floating
point indicators, and I thought these should be off?  In my debugs I have
found this:

10100111 p
80026100

that is the respective binary and hex register that the CSR gave on my D1.
I have turned this off in locore.S by unsetting the bits in CSR.  it's
just 2 instructions more.

Please have a look in page 39 of this RISCV-privileged (2021) document:
https://mainrechner.de/riscv-privileged-20211203.pdf

It is the same bit offset in mstatus and sstatus.

On the D1 after the CPU is reset the FP bits go back to 0, meaning that on
its depressive boot-life the FS bits have been turned on.

to check this I would add a debugging printf high in pmap_bootstrap() that
looks like so:

   status = csr_read(sstatus);
   printf("sstatus: %lX\n", status);

Principally I can do this too but it would take me some time changing source
trees and recompiling.

To turn floating point off, I have set this in locore.S:

/* turn off any possible FP bits set */
li  t0, SSTATUS_FS_MASK
csrcsstatus, t0

under the pagetable END.

Best Regards,
-peter

PS: If you would like me to keep D1 stuff to myself without relaying findings 
back to you let me know.  I know we don't use floating point code
in the kernel whatsoever.  Am I wrong?

-- 
Over thirty years experience on Unix-like Operating Systems starting with QNX.



Re: AMD gpu RX 6600 not recognized

2023-09-21 Thread Jonathan Gray
On Thu, Sep 21, 2023 at 09:55:44AM +0200, Solène Rapenne wrote:
> Le jeudi 21 septembre 2023 à 17:50 +1000, Jonathan Gray a écrit :
> > On Thu, Sep 21, 2023 at 09:05:50AM +0200, Solène Rapenne wrote:
> > > > Synopsis:   my GPU AMD Sapphire RX 6600 isn't recognized
> > > > Category:   kernel
> > > > Environment:
> > > System  : OpenBSD 7.4
> > > Details : OpenBSD 7.4-beta (GENERIC.MP) #1372: Wed Sep
> > > 20 09:43:54 MDT 2023
> > > 
> > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > 
> > > Architecture: OpenBSD.amd64
> > > Machine : amd64
> > > > Description:
> > > I can't get accelerated graphics with a Sapphire RX 6600.
> > >     The amdgpu firmware is correctly installed after running
> > > fw_update
> > 
> > > [drm] *ERROR* visible_vram_size 1ff00 or aper_base_kaddr 0x0 is
> > > not initialized.
> > > [drm] *ERROR* Failed to process memory training!
> > > [drm] *ERROR* sw_init of IP block  failed -22
> > > drm:pid0:amdgpu_device_init *ERROR* amdgpu_device_ip_init failed
> > > drm:pid0:amdgpu_attachhook *ERROR* Fatal error during GPU init
> > > efifb0 at mainbus0: 1920x1080, 32bpp
> > 
> > Does the bios have an option to disable resizable pci bar?
> > 
> > Doing an install with csm enabled (vga instead of efifb)
> > may also change things.
> 
> I have resizable pci bar enabled, I'll try to disable it.

There are also variables to limit the size of the window and the total,
which correspond to module parameters on linux.

amdgpu_vis_vram_limit

amdgpu_vram_limit

documented in
sys/dev/pci/drm/amd/amdgpu/amdgpu_drv.c

Perhaps this (untested) diff to limit to 512M would help.

Index: sys/dev/pci/drm/amd/amdgpu/amdgpu_ttm.c
===
RCS file: /cvs/src/sys/dev/pci/drm/amd/amdgpu/amdgpu_ttm.c,v
retrieving revision 1.14
diff -u -p -r1.14 amdgpu_ttm.c
--- sys/dev/pci/drm/amd/amdgpu/amdgpu_ttm.c 13 Aug 2023 10:36:26 -  
1.14
+++ sys/dev/pci/drm/amd/amdgpu/amdgpu_ttm.c 21 Sep 2023 08:05:19 -
@@ -1792,6 +1792,7 @@ int amdgpu_ttm_init(struct amdgpu_device
}
 
/* Reduce size of CPU-visible VRAM if requested */
+amdgpu_vis_vram_limit = 512;
vis_vram_limit = (u64)amdgpu_vis_vram_limit * 1024 * 1024;
if (amdgpu_vis_vram_limit > 0 &&
vis_vram_limit <= adev->gmc.visible_vram_size)



Re: AMD gpu RX 6600 not recognized

2023-09-21 Thread Solène Rapenne
Le jeudi 21 septembre 2023 à 17:50 +1000, Jonathan Gray a écrit :
> On Thu, Sep 21, 2023 at 09:05:50AM +0200, Solène Rapenne wrote:
> > > Synopsis:   my GPU AMD Sapphire RX 6600 isn't recognized
> > > Category:   kernel
> > > Environment:
> > System  : OpenBSD 7.4
> > Details : OpenBSD 7.4-beta (GENERIC.MP) #1372: Wed Sep
> > 20 09:43:54 MDT 2023
> > 
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > > Description:
> > I can't get accelerated graphics with a Sapphire RX 6600.
> >     The amdgpu firmware is correctly installed after running
> > fw_update
> 
> > [drm] *ERROR* visible_vram_size 1ff00 or aper_base_kaddr 0x0 is
> > not initialized.
> > [drm] *ERROR* Failed to process memory training!
> > [drm] *ERROR* sw_init of IP block  failed -22
> > drm:pid0:amdgpu_device_init *ERROR* amdgpu_device_ip_init failed
> > drm:pid0:amdgpu_attachhook *ERROR* Fatal error during GPU init
> > efifb0 at mainbus0: 1920x1080, 32bpp
> 
> Does the bios have an option to disable resizable pci bar?
> 
> Doing an install with csm enabled (vga instead of efifb)
> may also change things.

I have resizable pci bar enabled, I'll try to disable it.



Re: AMD gpu RX 6600 not recognized

2023-09-21 Thread Jonathan Gray
On Thu, Sep 21, 2023 at 09:05:50AM +0200, Solène Rapenne wrote:
> >Synopsis:my GPU AMD Sapphire RX 6600 isn't recognized
> >Category:kernel
> >Environment:
>   System  : OpenBSD 7.4
>   Details : OpenBSD 7.4-beta (GENERIC.MP) #1372: Wed Sep 20 09:43:54 
> MDT 2023
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   I can't get accelerated graphics with a Sapphire RX 6600.
> The amdgpu firmware is correctly installed after running fw_update

> [drm] *ERROR* visible_vram_size 1ff00 or aper_base_kaddr 0x0 is not 
> initialized.
> [drm] *ERROR* Failed to process memory training!
> [drm] *ERROR* sw_init of IP block  failed -22
> drm:pid0:amdgpu_device_init *ERROR* amdgpu_device_ip_init failed
> drm:pid0:amdgpu_attachhook *ERROR* Fatal error during GPU init
> efifb0 at mainbus0: 1920x1080, 32bpp

Does the bios have an option to disable resizable pci bar?

Doing an install with csm enabled (vga instead of efifb)
may also change things.