Re: Patched macOS kexts start Raven iGPU, but GPUVM page fault occurs on the first GFX and SDMA IB submitted by WindowServer. Help?

2023-02-06 Thread Visual (VisualDevelopment)
While simply waiting for a reply to the email was an attractive option, we 
chose to investigate other parts of the code during the last three days.

More precisely, we investigated _mmhub_1_0_update_medium_grain_clock_gating and 
seemed to have discovered a register offset mismatch. However, we soon found 
that the HWIP discovery code automatically applies Raven-specific fixes to the 
offset. Therefore, the behaviour is correct in the end.
We also thought of running the WhateverRed kext on a Vega machine to see how 
the kext behaves differently on a dGPU with similar architecture. The problem 
is that we need a tester who owns a Vega card to do that. Nyan Cat sent a 
message to seek assistance on the AMD OS X server but got no reply. In the 
meantime, Visual asked if his friend, who owns a Vega card,  were willing to 
help. However, he also received no response to this date.

We realise that we started the thread at an odd time (Saturday) and with an 
unformatted subject, not to mention a few typos we noticed right after sending 
the email. We hope that these flaws won't seize your interest in the project.

> On 4 Feb 2023, at 13:24, Visual (VisualDevelopment) 
>  wrote:
> 
> Table of Contents:
> 1. Introduction
> 2. History of WhateverRed
>2.1. Wrapping/Redirecting kext logic with Lilu
>2.2. VTables and our Reverse Engineering
>2.3. Debugging with a black screen
>2.4. Firmware injection and other HWLibs troubles
>2.5. AMDRadeonX5000 Video Decoding/Encoding and SDMA engine mismatches
>2.6. SDMA0 power on via SMC
>2.7. SDMA0 Accel channel skipping memory mapping commands
> 3. Current issue
>3.1. VM Protection Faults
>3.2. Analysis of the diagnostic dump
>3.3. A deeper dive into the protection fault
> 4. What we know so far
>4.1. The VM Blocks and the PDEs/PTEs
>4.2. The VM registers
>4.3. The PDE/PTE flags
>4.4. The translate_further mode
>4.5. The VMPTConfig in AMD kexts
>4.6. How the entryCount is determined on AMDGPU
>4.7. The GPUVM settings on AMDRadeonX5000 vs. AMDGPU
> 5. What we have tried
>5.1. PTE/PDE flags experimentations
>5.2. Experimentation with VMPTConfig and related settings
> 6. How you can help
>6.1. Unanswered questions
>6.2. Ways to contact us
> 
> 
> -- 1. Introduction --
> Hello everyone.
> We are a small team of 3 people trying to get Hackintoshes (PCs running 
> macOS) with AMD (Vega) iGPUs (specifically Raven/Raven2/Renoir and their 
> derivatives, such as Picasso) to have graphics acceleration on AMD laptops.
> To be precise, we are fixing broken and/or missing logic via patching the 
> existing kexts (currently AMDRadeonX5000 for GCN 5 (GFX 9) and AMDRadeonX6000 
> for VCN (GFX 10), AMDRadeonX6000Framebuffer for DCN instead of 
> AMD1Controller since it is DCE).
> 
> The team members are:
> - Visual, the Project Owner, is a Greek 17 year old CS student with extensive 
> knowledge on Operating System development. He writes most of the kext code 
> and provides insight on OS and Driver behaviour when possible.
> - NyanCatTW1, the Automation Engineer, is a 17-year-old student who lives in 
> Taiwan. The NYCU CSIE admitted him last year. He also does most of the 
> Reverse Engineering.
> - Allen Chen, the tester with a Renoir laptop, perseverance and some ideas; 
> helps with the effort occasionally, currently striving to become NyanCatTW1's 
> classmate again, as they were six years ago
> 
> Our kext, WhateverRed has successfully gotten the aforesaid kexts to 
> deterministically power up and start the IPs/MEs in the GPU, such as GFX and 
> SDMA. Attached are partial highlights of a dmesg log from the main testing 
> system:
> 
>[   27.351538]: netdbg: Disabled via boot arg
>[   27.351543]: rad: patching device type table
>[   27.351558]: rad: Automagically getting VBIOS from VFCT table
>...
>[   27.505319]: [3:0:0] [Accel] >>> Calling TTL::initialize()
>[   27.505331]: [AMD INFO] TTL Interface: Boot mode Normal.
>...
>[   27.649777]: [3:0:0] [Accel] <<< TTL::initialize() Completed 
> successfully.
>...
>[   27.662027]: Accelerator successfully registered with controller.
>...
>[   29.346963]: rad: _SmuRaven_Initialize returned 0x1
>[   29.346967]: rad: Sending PPSMC_MSG_PowerUpSdma (0xE) to the SMC
>[   29.347052]: rad: _Raven_SendMsgToSmcWithParameter returned 0x1
>...
>[   29.365343]: rad: powerUpHW: this = 0xff935ca3d000
>[   29.377219]: rad: powerUpHW returned 1
>[   29.377228]: [3:0:0]: Controller is enabled, finish initialization
>[   29.424252]: Adding AGDP mode validate property
>[   29.425160]: kPEDisableScreen 1
>[   29.425685]: [3:0:0] [FB:0] AmdRadeonFramebuffer::setCursorImage() !!! 
> Driver is offline.
>[   29.425695]: [3:0:0] [FB:1] AmdRadeonFramebuffer::setCursorImage() !!! 
> Driver is offline.
> 
> 
> The project is hosted on GitHub (https://github.com/NootInc/WhateverRed) with 
> 135 

Patched macOS kexts start Raven iGPU, but GPUVM page fault occurs on the first GFX and SDMA IB submitted by WindowServer. Help?

2023-02-04 Thread Visual (VisualDevelopment)
Table of Contents:
1. Introduction
2. History of WhateverRed
2.1. Wrapping/Redirecting kext logic with Lilu
2.2. VTables and our Reverse Engineering
2.3. Debugging with a black screen
2.4. Firmware injection and other HWLibs troubles
2.5. AMDRadeonX5000 Video Decoding/Encoding and SDMA engine mismatches
2.6. SDMA0 power on via SMC
2.7. SDMA0 Accel channel skipping memory mapping commands
3. Current issue
3.1. VM Protection Faults
3.2. Analysis of the diagnostic dump
3.3. A deeper dive into the protection fault
4. What we know so far
4.1. The VM Blocks and the PDEs/PTEs
4.2. The VM registers
4.3. The PDE/PTE flags
4.4. The translate_further mode
4.5. The VMPTConfig in AMD kexts
4.6. How the entryCount is determined on AMDGPU
4.7. The GPUVM settings on AMDRadeonX5000 vs. AMDGPU
5. What we have tried
5.1. PTE/PDE flags experimentations
5.2. Experimentation with VMPTConfig and related settings
6. How you can help
6.1. Unanswered questions
6.2. Ways to contact us


-- 1. Introduction --
Hello everyone.
We are a small team of 3 people trying to get Hackintoshes (PCs running macOS) 
with AMD (Vega) iGPUs (specifically Raven/Raven2/Renoir and their derivatives, 
such as Picasso) to have graphics acceleration on AMD laptops.
To be precise, we are fixing broken and/or missing logic via patching the 
existing kexts (currently AMDRadeonX5000 for GCN 5 (GFX 9) and AMDRadeonX6000 
for VCN (GFX 10), AMDRadeonX6000Framebuffer for DCN instead of 
AMD1Controller since it is DCE).

The team members are:
- Visual, the Project Owner, is a Greek 17 year old CS student with extensive 
knowledge on Operating System development. He writes most of the kext code and 
provides insight on OS and Driver behaviour when possible.
- NyanCatTW1, the Automation Engineer, is a 17-year-old student who lives in 
Taiwan. The NYCU CSIE admitted him last year. He also does most of the Reverse 
Engineering.
- Allen Chen, the tester with a Renoir laptop, perseverance and some ideas; 
helps with the effort occasionally, currently striving to become NyanCatTW1's 
classmate again, as they were six years ago

Our kext, WhateverRed has successfully gotten the aforesaid kexts to 
deterministically power up and start the IPs/MEs in the GPU, such as GFX and 
SDMA. Attached are partial highlights of a dmesg log from the main testing 
system:

[   27.351538]: netdbg: Disabled via boot arg
[   27.351543]: rad: patching device type table
[   27.351558]: rad: Automagically getting VBIOS from VFCT table
...
[   27.505319]: [3:0:0] [Accel] >>> Calling TTL::initialize()
[   27.505331]: [AMD INFO] TTL Interface: Boot mode Normal.
...
[   27.649777]: [3:0:0] [Accel] <<< TTL::initialize() Completed 
successfully.
...
[   27.662027]: Accelerator successfully registered with controller.
...
[   29.346963]: rad: _SmuRaven_Initialize returned 0x1
[   29.346967]: rad: Sending PPSMC_MSG_PowerUpSdma (0xE) to the SMC
[   29.347052]: rad: _Raven_SendMsgToSmcWithParameter returned 0x1
...
[   29.365343]: rad: powerUpHW: this = 0xff935ca3d000
[   29.377219]: rad: powerUpHW returned 1
[   29.377228]: [3:0:0]: Controller is enabled, finish initialization
[   29.424252]: Adding AGDP mode validate property
[   29.425160]: kPEDisableScreen 1
[   29.425685]: [3:0:0] [FB:0] AmdRadeonFramebuffer::setCursorImage() !!! 
Driver is offline.
[   29.425695]: [3:0:0] [FB:1] AmdRadeonFramebuffer::setCursorImage() !!! 
Driver is offline.


The project is hosted on GitHub (https://github.com/NootInc/WhateverRed) with 
135 stargazers as of 2023-02-04.

Currently, everything seems to go smoothly up to the point WindowServer tries 
-and fails- to make use of the iGPU (See Chapter 3 for details)
We first ran into the issue on 2022-11-27, but as of 2023-02-04, we haven't 
been able to find a way to fix it.
This is why we're asking for help on the amd-gfx mailing list. However, 
considering the complexity of both the project and the issue, we suspect it 
would be necessary to give you a brief review of the project's history, the 
issue we currently are facing, everything we know about the issue, and what we 
have tried first.
It'll be a long ride (about 25 minutes) so feel free to skip right to Chapter 6 
if you don't have the time.

-- 2. History of WhateverRed --
For your interest, we have documented a large portion of our previous work 
here. But feel free to skip to the problem itself (Chapter 3) in case that's 
more practical for you.


-- 2.1 Wrapping/Redirecting kext logic with Lilu --
First of all, it is quite probable that you are wondering how we are even 
debugging these kexts, even modifying them; the answer is Lilu. Lilu allows you 
to hook symbols and replace them with your own logic, and also save the 
original to a different place. This is done possible by looking for the symbol, 
saving the original logic,