On Wed, Apr 20, 2022 at 7:35 AM Jason Gunthorpe <[email protected]> wrote:
>
> On Tue, Apr 19, 2022 at 05:47:56PM -0700, Dan Williams wrote:
> > On Tue, Apr 19, 2022 at 4:04 PM Jason Gunthorpe <[email protected]> wrote:
> > >
> > > On Tue, Apr 19, 2022 at 02:59:46PM -0700, Dan Williams wrote:
> > >
> > > > ...or are you suggesting to represent CXL free memory capacity in
> > > > iomem_resource and augment the FW list early with CXL ranges. That
> > > > seems doable, but it would only represent the free CXL ranges in
> > > > iomem_resource as the populated CXL ranges cannot have their resources
> > > > reparented after the fact, and there is plenty of code that expects
> > > > "System RAM" to be a top-level resource.
> > >
> > > Yes, something more like this. iomem_resource should represent stuff
> > > actually in use and CXL shouldn't leave behind an 'IOW' for address
> > > space it isn't actually able to currently use.
> >
> > So that's the problem, these gigantic windows need to support someone
> > showing up unannounced with a stack of multi-terabyte devices to add
> > to the system.
>
> In my experience PCIe hotplug is already extremely rare, you may need
> to do this reservation on systems with hotplug slots, but not
> generally. In PCIe world the BIOS often figures this out and bridge
> windows are not significantly over allocated on non-hotplug HW.
>
> (though even PCIe has the resizable bar extension and other things
> that are quite like hotplug and do trigger huge resource requirements)
>
> > > Your whole description sounds like the same problems PCI hotplug has
> > > adjusting the bridge windows.
> >
> > ...but even there the base bounds (AFAICS) are coming from FW (_CRS
> > entries for ACPI described PCIe host bridges). So if CXL follows that
> > model then the entire unmapped portion of the CXL ranges should be
> > marked as an idle resource in iomem_resource.
>
> And possibly yes, because part of the point of this stuff is to
> declare where HW is actually using the address space. So if FW has
> left a host bridge decoder setup to actually consume this space then
> it really has to be set aside to prevent hotplug of other bus types
> from trying to claim the same address space for their own usages.
>
> If no actual decoder is setup then it maybe it shouldn't be left as an
> IOW in the resource tree. In this case it might be better to teach the
> io resource allocator to leave gaps for future hotplug.

Yeah, it is the former. These CXL ranges are all actively decoded by
the CPU complex memory controller as "this range goes to DDR and this
other range is interleaved across this set of CXL host bridges". Even
if there is nothing behind those host bridges there is hardware
actively routing requests that fall into those ranges to those
downstream devices.

> > The improvement that offers over this current proposal is that it
> > allows for global visibility of CXL hotplug resources, but it does set
> > up a discontinuity between FW mapped and OS mapped CXL. FW mapped will
> > have top-level "System RAM" resources indistinguishable from typical
> > DRAM while OS mapped CXL will look like this:
>
> Maybe this can be reotractively fixed up in the resource tree?

I had been discouraged to go that route considering some code only
scans top-level iomem_resource entries, but it is probably better to
try to fix that legacy code to operate correctly when System RAM is
parented by a CXL Range.

Reply via email to