On 22-01-23 16:28:49, Dan Williams wrote:
> From: Ben Widawsky <[email protected]>
>
> The original driver implementation used the doorbell timeout for the
> Mailbox Interface Ready bit to piggy back off of, since the latter does
> not have a defined timeout. This functionality, introduced in commit
> 8adaf747c9f0 ("cxl/mem: Find device capabilities"), needs improvement as
> the recent "Add Mailbox Ready Time" ECN timeout indicates that the
> mailbox ready time can be significantly longer that 2 seconds.
>
> While the specification limits the maximum timeout to 256s, the cxl_pci
> driver gives up on the mailbox after 60s. This value corresponds with
> important timeout values already present in the kernel. A module
> parameter is provided as an emergency override and represents the
> default Linux policy for all devices.
>
> Signed-off-by: Ben Widawsky <[email protected]>
> Reviewed-by: Jonathan Cameron <[email protected]>
> [djbw: add modparam, drop check_device_status()]
> Co-developed-by: Dan Williams <[email protected]>
> Signed-off-by: Dan Williams <[email protected]>
> ---
> drivers/cxl/pci.c | 35 +++++++++++++++++++++++++++++++++++
> 1 file changed, 35 insertions(+)
>
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 8dc91fd3396a..ed8de9eac970 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1,7 +1,9 @@
> // SPDX-License-Identifier: GPL-2.0-only
> /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> #include <linux/io-64-nonatomic-lo-hi.h>
> +#include <linux/moduleparam.h>
> #include <linux/module.h>
> +#include <linux/delay.h>
> #include <linux/sizes.h>
> #include <linux/mutex.h>
> #include <linux/list.h>
> @@ -35,6 +37,20 @@
> /* CXL 2.0 - 8.2.8.4 */
> #define CXL_MAILBOX_TIMEOUT_MS (2 * HZ)
>
> +/*
> + * CXL 2.0 ECN "Add Mailbox Ready Time" defines a capability field to
> + * dictate how long to wait for the mailbox to become ready. The new
> + * field allows the device to tell software the amount of time to wait
> + * before mailbox ready. This field per the spec theoretically allows
> + * for up to 255 seconds. 255 seconds is unreasonably long, its longer
> + * than the maximum SATA port link recovery wait. Default to 60 seconds
> + * until someone builds a CXL device that needs more time in practice.
> + */
> +static unsigned short mbox_ready_timeout = 60;
> +module_param(mbox_ready_timeout, ushort, 0600);
Any reason not to make it 0644?
> +MODULE_PARM_DESC(mbox_ready_timeout,
> + "seconds to wait for mailbox ready status");
> +
> static int cxl_pci_mbox_wait_for_doorbell(struct cxl_dev_state *cxlds)
> {
> const unsigned long start = jiffies;
> @@ -281,6 +297,25 @@ static int cxl_pci_mbox_send(struct cxl_dev_state
> *cxlds, struct cxl_mbox_cmd *c
> static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
> {
> const int cap = readl(cxlds->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET);
> + unsigned long timeout;
> + u64 md_status;
> +
> + timeout = jiffies + mbox_ready_timeout * HZ;
> + do {
> + md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
> + if (md_status & CXLMDEV_MBOX_IF_READY)
> + break;
> + if (msleep_interruptible(100))
> + break;
> + } while (!time_after(jiffies, timeout));
Just pointing out the [probably] obvious. If the user specifies a zero second
timeout, the code will still wait 100ms.
> +
> + if (!(md_status & CXLMDEV_MBOX_IF_READY)) {
> + dev_err(cxlds->dev,
> + "timeout awaiting mailbox ready, device state:%s%s\n",
> + md_status & CXLMDEV_DEV_FATAL ? " fatal" : "",
> + md_status & CXLMDEV_FW_HALT ? " firmware-halt" : "");
> + return -EIO;
> + }
>
> cxlds->mbox_send = cxl_pci_mbox_send;
> cxlds->payload_size =
>