On 22-02-01 12:45:06, Jonathan Cameron wrote:
> On Tue, 25 Jan 2022 19:16:05 -0800
> Dan Williams <[email protected]> wrote:
>
> > From: Ben Widawsky <[email protected]>
> >
> > At this point the subsystem can enumerate all CXL ports (CXL.mem decode
> > resources in upstream switch ports and host bridges) in a system. The
> > last mile is connecting those ports to endpoints.
> >
> > The cxl_mem driver connects an endpoint device to the platform CXL.mem
> > protoctol decode-topology. At ->probe() time it walks its
> > device-topology-ancestry and adds a CXL Port object at every Upstream
> > Port hop until it gets to CXL root. The CXL root object is only present
> > after a platform firmware driver registers platform CXL resources. For
> > ACPI based platform this is managed by the ACPI0017 device and the
> > cxl_acpi driver.
> >
> > The ports are registered such that disabling a given port automatically
> > unregisters all descendant ports, and the chain can only be registered
> > after the root is established.
> >
> > Given ACPI device scanning may run asynchronously compared to PCI device
> > scanning the root driver is tasked with rescanning the bus after the
> > root successfully probes.
> >
> > Conversely if any ports in a chain between the root and an endpoint
> > becomes disconnected it subsequently triggers the endpoint to
> > unregister. Given lock depenedencies the endpoint unregistration happens
> > in a workqueue asynchronously. If userspace cares about synchronizing
> > delayed work after port events the /sys/bus/cxl/flush attribute is
> > available for that purpose.
> >
> > Reported-by: Randy Dunlap <[email protected]>
> > Signed-off-by: Ben Widawsky <[email protected]>
> > [djbw: clarify changelog, rework hotplug support]
> > Signed-off-by: Dan Williams <[email protected]>
> > ---
>
> A few comments inline.
>
> Jonathan
>
> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index 7bd53dc691ec..df6691d0a6d0 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -314,7 +314,8 @@ static int cxl_acpi_probe(struct platform_device *pdev)
> > if (rc < 0)
> > return rc;
> >
> > - return 0;
> > + /* In case PCI is scanned before ACPI re-trigger memdev attach */
> > + return cxl_bus_rescan();
> > }
> >
> > static const struct acpi_device_id cxl_acpi_ids[] = {
> > @@ -335,3 +336,4 @@ module_platform_driver(cxl_acpi_driver);
> > MODULE_LICENSE("GPL v2");
> > MODULE_IMPORT_NS(CXL);
> > MODULE_IMPORT_NS(ACPI);
> > +MODULE_SOFTDEP("pre: cxl_port");
> I think a comment on 'why' would be useful for the SOFTDEP.
>
> They are rare enough that it might surprise people.
I wonder with Dan's reworks in core/port if this is actually still needed. I'm
thinking it isn't.
>
> > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> > index 1a50c0fc399c..efbaa851929d 100644
> > --- a/drivers/cxl/core/core.h
> > +++ b/drivers/cxl/core/core.h
> > @@ -6,6 +6,7 @@
> >
> > extern const struct device_type cxl_nvdimm_bridge_type;
> > extern const struct device_type cxl_nvdimm_type;
> > +extern const struct device_type cxl_memdev_type;
> >
> > extern struct attribute_group cxl_base_attribute_group;
> >
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index b2773664e407..ee0156419d06 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c
> > @@ -155,13 +155,19 @@ static const struct attribute_group
> > *cxl_memdev_attribute_groups[] = {
> > NULL,
> > };
> >
> > -static const struct device_type cxl_memdev_type = {
> > +const struct device_type cxl_memdev_type = {
>
> Currently this is only exposed for type checking and you also have
> is_cxl_memdev for
> that so seems a bit unnecessary.
Looks like this was leftover after some earlier change. I don't see a reason to
expose it either.
>
> > .name = "cxl_memdev",
> > .release = cxl_memdev_release,
> > .devnode = cxl_memdev_devnode,
> > .groups = cxl_memdev_attribute_groups,
> > };
> >
> > +bool is_cxl_memdev(struct device *dev)
> > +{
> > + return dev->type == &cxl_memdev_type;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(is_cxl_memdev, CXL);
> > +
> > /**
> > * set_exclusive_cxl_commands() - atomically disable user cxl commands
> > * @cxlds: The device state to operate on
> > @@ -213,6 +219,15 @@ static void cxl_memdev_unregister(void *_cxlmd)
> > put_device(dev);
> > }
> >
> > +static void detach_memdev(struct work_struct *work)
> > +{
> > + struct cxl_memdev *cxlmd;
> > +
> > + cxlmd = container_of(work, typeof(*cxlmd), detach_work);
> > + device_release_driver(&cxlmd->dev);
> > + put_device(&cxlmd->dev);
> > +}
> > +
> > static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
> > const struct file_operations *fops)
> > {
> > @@ -237,6 +252,7 @@ static struct cxl_memdev *cxl_memdev_alloc(struct
> > cxl_dev_state *cxlds,
> > dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
> > dev->type = &cxl_memdev_type;
> > device_set_pm_not_required(dev);
> > + INIT_WORK(&cxlmd->detach_work, detach_memdev);
> >
> > cdev = &cxlmd->cdev;
> > cdev_init(cdev, fops);
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index b61957636907..75a66540a795 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -1,6 +1,7 @@
> > // SPDX-License-Identifier: GPL-2.0-only
> > /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> > #include <linux/io-64-nonatomic-lo-hi.h>
> > +#include <linux/workqueue.h>
> > #include <linux/device.h>
> > #include <linux/module.h>
> > #include <linux/pci.h>
> > @@ -46,6 +47,8 @@ static int cxl_device_id(struct device *dev)
> > return CXL_DEVICE_ROOT;
> > return CXL_DEVICE_PORT;
> > }
> > + if (dev->type == &cxl_memdev_type)
>
> is_cxl_memdev() ?
> Having dong that, is there any need to expose cxl_memdev_type?
>
Same as above.
> > + return CXL_DEVICE_MEMORY_EXPANDER;
> > return 0;
> > }
> >
> > @@ -320,8 +323,10 @@ static void unregister_port(void *_port)
> > {
> > struct cxl_port *port = _port;
> >
> > - if (!is_cxl_root(port))
> > + if (!is_cxl_root(port)) {
> > device_lock_assert(port->dev.parent);
> > + port->uport = NULL;
> > + }
> >
> > device_unregister(&port->dev);
> > }
>
> ...
>
> > +static void delete_endpoint(void *data)
> > +{
> > + struct cxl_memdev *cxlmd = data;
> > + struct cxl_port *endpoint = dev_get_drvdata(&cxlmd->dev);
> > + struct cxl_port *parent_port;
> > + struct device *parent;
> > +
> > + parent_port = cxl_mem_find_port(cxlmd);
> > + if (!parent_port)
> > + return;
> > + parent = &parent_port->dev;
> > +
> > + cxl_device_lock(parent);
> > + if (parent->driver && endpoint->uport) {
> > + devm_release_action(parent, cxl_unlink_uport, endpoint);
> > + devm_release_action(parent, unregister_port, endpoint);
> > + }
> > + cxl_device_unlock(parent);
> > + put_device(parent);
> > + put_device(&endpoint->dev);
> > +}
> > +
> > +int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port
> > *endpoint)
> > +{
> > + struct device *dev = &cxlmd->dev;
> > +
> > + get_device(&endpoint->dev);
> > + dev_set_drvdata(dev, endpoint);
>
> That's a little nasty if it's just to provide
> a second parameter to delete_endpoint, but I guess nothing else was using the
> drvdata..
I think Dan authored this bit. In my region series I cache the port that the
endpoint created and it could be used instead. I haven't actually looked at the
feedback from Dan on that patch yet.
All the comments below this seem reasonable to me.
>
> > + return devm_add_action_or_reset(dev, delete_endpoint, cxlmd);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cxl_endpoint_autoremove, CXL);
>
> ...
>
>
> > +static BUS_ATTR_WO(flush);
> > +
> > static __init int cxl_core_init(void)
> > {
> > int rc;
> > @@ -1329,12 +1395,27 @@ static __init int cxl_core_init(void)
> > if (rc)
> > return rc;
> >
> > + cxl_bus_wq = alloc_ordered_workqueue("cxl_port", 0);
> > + if (!cxl_bus_wq) {
> > + rc = -ENOMEM;
> > + goto err_wq;
> > + }
> > +
> > rc = bus_register(&cxl_bus_type);
> > if (rc)
> > - goto err;
> > + goto err_bus;
> > +
> > + rc = bus_create_file(&cxl_bus_type, &bus_attr_flush);
>
> Can't we add this as part of the bus_type? Always good to avoid
> dynamic sysfs file creation if we possibly can.
>
> > + if (rc)
> > + goto err_flush;
> > +
> > return 0;
> >
> > -err:
> > +err_flush:
> > + bus_unregister(&cxl_bus_type);
> > +err_bus:
> > + destroy_workqueue(cxl_bus_wq);
> > +err_wq:
> > cxl_memdev_exit();
> > cxl_mbox_exit();
> > return rc;
> > @@ -1342,7 +1423,9 @@ static __init int cxl_core_init(void)
> >
> > static void cxl_core_exit(void)
> > {
> > + bus_remove_file(&cxl_bus_type, &bus_attr_flush);
> > bus_unregister(&cxl_bus_type);
> > + destroy_workqueue(cxl_bus_wq);
> > cxl_memdev_exit();
> > cxl_mbox_exit();
> > }
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index b71d40b68ccd..0bbe394f2f26 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -323,6 +323,8 @@ struct cxl_port *devm_cxl_add_port(struct device *host,
> > struct device *uport,
> > struct cxl_port *find_cxl_root(struct device *dev);
> > int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd);
> > int cxl_bus_rescan(void);
> > +struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd);
>
> Should be in previous patch where the function is defined.
>
> > +bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd);
> >
>
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 0ba0cf8dcdbc..7ba0edb4a1ab 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -35,11 +35,14 @@
> > * @cdev: char dev core object for ioctl operations
> > * @cxlds: The device state backing this device
> > * @id: id number of this memdev instance.
> > + * @detach_work: active memdev lost a port in its ancestry
> > + * @component_reg_phys: register base of component registers
>
> ?
>
> > */
> > struct cxl_memdev {
> > struct device dev;
> > struct cdev cdev;
> > struct cxl_dev_state *cxlds;
> > + struct work_struct detach_work;
> > int id;
> > };
> >
> > @@ -48,6 +51,12 @@ static inline struct cxl_memdev *to_cxl_memdev(struct
> > device *dev)
> > return container_of(dev, struct cxl_memdev, dev);
> > }
> >
> > +bool is_cxl_memdev(struct device *dev);
> > +static inline bool is_cxl_endpoint(struct cxl_port *port)
> > +{
> > + return is_cxl_memdev(port->uport);
> > +}
> > +
> > struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds);
> >
> > /**
> > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > new file mode 100644
> > index 000000000000..27f9dd0d55b6
> > --- /dev/null
> > +++ b/drivers/cxl/mem.c
> > @@ -0,0 +1,222 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
> > +#include <linux/device.h>
> > +#include <linux/module.h>
> > +#include <linux/pci.h>
> > +
> > +#include "cxlmem.h"
> > +#include "cxlpci.h"
> > +
> > +/**
> > + * DOC: cxl mem
> > + *
> > + * CXL memory endpoint devices and switches are CXL capable devices that
> > are
> > + * participating in CXL.mem protocol. Their functionality builds on top of
> > the
> > + * CXL.io protocol that allows enumerating and configuring components via
> > + * standard PCI mechanisms.
> > + *
> > + * The cxl_mem driver owns kicking off the enumeration of this CXL.mem
> > + * capability. With the detection of a CXL capable endpoint, the driver
> > will
> > + * walk up to find the platform specific port it is connected to, and
> > determine
> > + * if there are intervening switches in the path. If there are switches, a
> > + * secondary action to enumerate those (implemented in cxl_core).
>
> action is to
>
> > Finally the
> > + * cxl_mem driver will add the device it is bound to as a CXL port for use
> > in
> > + * higher level operations.
> > + */
>
> ...
>
>
> > +
> > +/**
> > + * cxl_dvsec_decode_init() - Setup HDM decoding for the endpoint
> > + * @cxlds: Device state
> > + *
> > + * Additionally, enables global HDM decoding. Warning: don't call this
> > outside
> > + * of probe. Once probe is complete, the port driver owns all access to
> > the HDM
> > + * decoder registers.
> > + *
> > + * Returns: false if DVSEC Ranges are being used instead of HDM decoders;
> > + * otherwise returns true.
>
> It feels like some of the paths below could reflect other problems
> rather than an intention to use ranges. Maybe this nees to have separate
> error handling from detection of range register usage?
>
> > + */
> > +__mock bool cxl_dvsec_decode_init(struct cxl_dev_state *cxlds)
> > +{
> > + struct cxl_endpoint_dvsec_info *info = &cxlds->info;
> > + struct cxl_register_map map;
> > + struct cxl_component_reg_map *cmap = &map.component_map;
> > + bool global_enable, do_hdm_init = false;
> > + void __iomem *crb;
> > + u32 global_ctrl;
> > +
> > + /* map hdm decoder */
> > + crb = ioremap(cxlds->component_reg_phys, CXL_COMPONENT_REG_BLOCK_SIZE);
> > + if (!crb) {
> > + dev_dbg(cxlds->dev, "Failed to map component registers\n");
> > + return false;
> > + }
> > +
> > + cxl_probe_component_regs(cxlds->dev, crb, cmap);
> > + if (!cmap->hdm_decoder.valid) {
> > + dev_dbg(cxlds->dev, "Invalid HDM decoder registers\n");
> > + goto out;
> > + }
> > +
> > + global_ctrl = readl(crb + cmap->hdm_decoder.offset +
> > + CXL_HDM_DECODER_CTRL_OFFSET);
> > + global_enable = global_ctrl & CXL_HDM_DECODER_ENABLE;
> > + if (!global_enable && info->ranges) {
> > + dev_dbg(cxlds->dev, "DVSEC regions\n");
>
> Perhaps worth adding a little more description to that.
> Perhaps
>
> "DVSEC ranges already programmed and HDM decoders not enabled."
>
> > + goto out;
> > + }
> > +
> > + do_hdm_init = true;
> > +
> > + /*
> > + * Turn on global enable now since DVSEC ranges aren't being used and
> > + * we'll eventually want the decoder enabled. This also prevents special
> > + * casing in the port driver since this only applies to endpoints.
>
> Possibly worth saying why it isn't worth turning this off again in the remove
> path...
>
> > + */
> > + if (!global_enable) {
> > + dev_dbg(cxlds->dev, "Enabling HDM decode\n");
> > + writel(global_ctrl | CXL_HDM_DECODER_ENABLE,
> > + crb + cmap->hdm_decoder.offset +
> > + CXL_HDM_DECODER_CTRL_OFFSET);
> > + }
> > +
> > +out:
> > + iounmap(crb);
> > + return do_hdm_init;
> > +}
> > +
>
> > +MODULE_LICENSE("GPL v2");
> > +MODULE_IMPORT_NS(CXL);
> > +MODULE_ALIAS_CXL(CXL_DEVICE_MEMORY_EXPANDER);
> > +MODULE_SOFTDEP("pre: cxl_port");
>
> As above, a 'why' comment for the softdep would be a good to have.
>
> > diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> > index ae94a537eccc..27ab7f8d122e 100644
> > --- a/drivers/cxl/port.c
> > +++ b/drivers/cxl/port.c
> > @@ -25,16 +25,27 @@
> > * PCIe topology.
> > */
> >
> > +static void schedule_detach(void *cxlmd)
> > +{
> > + schedule_cxl_memdev_detach(cxlmd);
> > +}
> > +
> > static int cxl_port_probe(struct device *dev)
> > {
> > struct cxl_port *port = to_cxl_port(dev);
> > struct cxl_hdm *cxlhdm;
> > int rc;
> >
> > + if (is_cxl_endpoint(port)) {
> > + struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport);
> > +
> > + get_device(&cxlmd->dev);
> > + return devm_add_action_or_reset(dev, schedule_detach, cxlmd);
> > + }
> > +
> > rc = devm_cxl_port_enumerate_dports(port);
> > if (rc < 0)
> > return rc;
> > -
>
> Reasonable to drop this, but not in this patch.
>
>
> > if (rc == 1)
> > return devm_cxl_add_passthrough_decoder(port);
> >
>