Public bug reported:
[Description]
This patch stack backports the CXL enablement needed by DGX CXL
platforms to the linux-nvidia-7.0 kernel. It combines the CXL
dependency/Type-2 stack with the CXL state save/restore and
reset stack.
It includes:
1. CXL Type-2 and accelerator support
This brings in the CXL Type-2 foundation and accelerator CXL
plumbing required for accelerator-attached CXL memory devices:
- CXL Type-2 support in cxl_dev_state initialization
- Exported CXL internals needed by external Type-2 drivers
- Accelerator CXL device creation and CXL register mapping
- Type-2 memdev creation and attach-region handling
- Avoiding DAX creation for accelerator memdevs
2. DAX/HMEM and Soft Reserved coordination
This carries the DAX/HMEM coordination needed so Soft Reserved
memory ownership is resolved correctly with CXL regions:
- Deferred dax_cxl binding until dax_hmem ownership resolution
- Soft Reserved containment checks against committed CXL regions
- Reintroduction of Soft Reserved ranges into the iomem tree
- DEV_DAX_CXL gating and cxl_acpi/cxl_pci module request
ordering
3. CXL configuration and platform dependencies
The stack includes CXL config annotations and related platform
dependencies:
- CXL Type-2 and RAS config annotations
- CXL DAX and KMEM config enablement
- PCI_CXL annotation for CXL state save/restore
- ATS enablement dependencies needed by pre-CXL and
CXL.cache-capable devices
4. PCI CXL state save/restore
This backports Srirangan Madhavan's CXL state save/restore
series:
- CXL DVSEC control, lock, and range register definitions
- Public CXL HDM decoder/register-map definitions
- PCI virtual extended capability save buffer support
- CXL DVSEC state save/restore across resets
- HDM decoder state save/restore
- PCI CXL save/restore wiring via drivers/pci/cxl.c
5. CXL reset v5 support
This also backports the CXL reset v5 series:
- Revert of the older single-commit CXL reset implementation
- CXL DVSEC reset and capability register definitions
- Export of pci_dev_save_and_disable() and pci_dev_restore()
- CXL memory offlining and cache flush helpers
- Multi-function sibling coordination for CXL reset
- Full CXL reset flow orchestration
- cxl_reset sysfs interface for PCI devices
- ABI documentation for the cxl_reset sysfs attribute
[Justification]
This backport is required for DGX CXL enablement on the
linux-nvidia-7.0 kernel. The combined stack enables CXL Type-2
accelerator memory support, correct Soft Reserved ownership
handling, CXL PCI state preservation, and controlled CXL device
reset flows.
Without this stack:
- Type-2 accelerator CXL memory devices cannot be represented
correctly
- Accelerator CXL device plumbing is missing
- Soft Reserved memory may be claimed by the wrong DAX path
- PCI reset paths can lose CXL DVSEC/HDM decoder state
- The newer CXL reset flow and cxl_reset sysfs interface are
unavailable
Source Patch Breakdown
1. CXL dependency and Type-2 backport
Includes Type-2 CXL, accelerator CXL plumbing, DAX/HMEM Soft
Reserved coordination, CXL interleaving support, RAS/config
annotations, and platform dependencies.
2. CXL state save/restore and reset backport
Includes Srirangan Madhavan's CXL state save/restore series and
CXL reset v5 series, plus the cxl_reset sysfs ABI documentation.
Branch / Review Context
Current branch:
bug-DGX-16137/cxl-backport-26.04-bos-nvpr
Base branch:
bug-DGX-16136/cxl-backport-26.04-bos
The final branch range contains only the CXL save/restore and
reset commits after the CXL dependency/Type-2 base. Duplicate
DAX commits mistakenly left during rebase were removed.
[Testing]
Build Validation:
- Remote arm64 nvidia-bos whole-kernel build passed for the
final CXL save/restore and reset stack.
- Build command covered Image, modules, and dtbs.
- Produced vmlinux and arch/arm64/boot/Image.
Reset Validation:
- CXL reset validation passed on DGX CXL devices.
- CXL control/range state was preserved across reset.
- No fatal CXL/PCI/AER/DPC dmesg messages were observed.
- ResetComplete transitioned as expected and ResetError remained
clear.
Static Validation:
- Branch audit found no duplicate commits against the CXL
dependency/Type-2 base after cleanup.
- reset_done() ERR_PTR endpoint dereference review finding was
fixed in the reset orchestration commit.
- Focused static guard/order check passed.
- checkpatch on the touched CXL diff passed.
Config Verification:
The stack includes config annotations for:
- CXL_BUS
- CXL_PCI
- CXL_MEM
- CXL_REGION
- CXL_RAS
- CXL DAX/KMEM support
- PCI_CXL
[Notes]
This series depends on the CXL dependency/Type-2 backport and
layers CXL state save/restore plus reset support on top.
The older single-commit CXL reset implementation is reverted
before applying the newer reset v5 flow to avoid duplicated
reset helpers and stale DVSEC definitions.
** Affects: linux-nvidia-7.0 (Ubuntu)
Importance: Undecided
Status: New
** Also affects: linux-nvidia-7.0 (Ubuntu)
Importance: Undecided
Status: New
** No longer affects: linux-nvidia-6.17 (Ubuntu)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2153819
Title:
CXL: Backport Type-2, state save/restore, and reset support
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-7.0/+bug/2153819/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs