Public bug reported:

[ Impact ]

The amdgpu probing sequence can fail for any multitude of reasons
(missing firmware, missing NPI support, unexpected hardware response).

Ideally; we would have all NPI support in Ubuntu at product launch, but
due to kernel schedule this isn't always feasible; particularly for
dGPUs.

For missing NPI support on dGPU specifically we want to be able to
identify the system though so that we can potentially offer DKMS package
from amdgpu-install script.

To support this need there are changes introduced that will export IP
discovery information to userspace even during probe failure.

[ Fix ]

The major patchsets are:

1. drm/amdgpu: Export ip_discovery sysfs on probe failure[1]
2. drm/amdgpu: clean up discovery and preempt sysfs entries on shutdown[2]
3. drm/amdgpu: don't free standalone ip_discovery sysfs in sysfs_fini[3]

And their dependent patches. See additional information.

[ Test plan ]

On APU/dGPU already supported by Ubuntu out-of-the-box:

    1) Remove a GPU firmware from /lib/firmware.
    2) Reboot system
    3) Verify amdgpu shows probe error
    4) Verify that IP discovery info in sysfs
    5) Unload amdgpu
    6) Restore GPU firmware
    7) reload amdgpu
    8) Verify works

On new APU/dGPU

   1) Verify amdgpu shows probe error
   2) Verify that IP discovery info in sysfs

[ Where problems could occur]

This changes amdgpu module load/unload behaviors, and internal
implementation of some of its sysfs files. Bugs in those implementation
may impact tools actively reading those sysfs files (likely, various
AMD's own tools), or load/unload amdgpu module.


[ Additional information ]

[1] 
https://lore.kernel.org/amd-gfx/[email protected]/
[2] 
https://lore.kernel.org/amd-gfx/[email protected]/
[3] 
https://lore.kernel.org/amd-gfx/[email protected]/

The full patch list to support this on 7.0-based kernel:

```
UBUNTU: SAUCE: drm/amdgpu: don't free standalone ip_discovery sysfs in 
sysfs_fini
drm/amdgpu: clean up discovery and preempt sysfs entries on shutdown
drm/amdgpu: Export ip_discovery sysfs on probe failure
drm/amdgpu: Fix discovery offset check under VF
drm/amdgpu: fix IP discovery v0 handling
drm/amdgpu: Avoid NULL dereference in discovery topology coredump path v3
drm/amdgpu: fallback to default discovery offset/size in sriov guest
drm/amdgpu: New interface to get IP discovery binary v3
drm/amdgpu/discovery: use common function to check discovery table
drm/amdgpu/discovery: support new discovery binary header
drm/amdgpu: include ip discovery data in devcoredump
```

** Affects: linux-oem-7.0 (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: originate-from-2158468

** Tags added: originate-from-2158468

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2158690

Title:
  [SRU] Export IP discovery even if amdgpu probe fails

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-oem-7.0/+bug/2158690/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to