clear-error commands

Verma, Vishal L Thu, 29 Jan 2026 11:45:44 -0800

On Thu, 2026-01-22 at 14:37 -0600, Ben Cheatham wrote:
> Add man pages for the 'cxl-inject-error' and 'cxl-clear-error' commands.
> These man pages show usage and examples for each of their use cases.
> 
> Reviewed-by: Dave Jiang <[email protected]>
> Signed-off-by: Ben Cheatham <[email protected]>


Sorry to jump in late in the review cycle, but I had some thoughts on
the command interface below.

<snip>
> 
> +
> +cxl-inject-error(1)
> +===================
> +
> +NAME
> +----
> +cxl-inject-error - Inject CXL errors into CXL devices
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'cxl inject-error' <device name> [<options>]
> +
> +WARNING: Error injection can cause system instability and should only be used
> +for debugging hardware and software error recovery flows. Use at your own 
> risk!
> +
> +Inject an error into a CXL device. The type of errors supported depend on the
> +device specified. The types of devices supported are:
> +
> +"Downstream Ports":: A CXL RCH downstream port (dport) or a CXL VH root port.
> +Eligible ports will have their 'protocol_injectable' attribute in 'cxl-list'
> +set to true. Dports are specified by host name ("0000:0e:01.1").
> +"memdevs":: A CXL memory device. Memory devices are specified by device name
> +("mem0"), device id ("0"), and/or host device name ("0000:35:00.0").
> +
> +There are two types of errors which can be injected: CXL protocol errors
> +and device poison.
> +
> +CXL protocol errors can only be used with downstream ports (as defined 
> above).
> +Protocol errors follow the format of "<protocol>-<severity>". For example,
> +a "mem-fatal" error is a CXL.mem fatal protocol error. Protocol errors can be
> +found in the "injectable_protocol_errors" list under a CXL bus object. This
> +list is only available when the CXL debugfs is accessible (normally mounted
> +at "/sys/kernel/debug/cxl"). For example:
> +
> +----
> +
> +# cxl list -B
> +[
> +  {
> +     "bus":"root0",
> +     "provider":"ACPI.CXL",
> +     "injectable_protocol_errors":[
> +       "mem-correctable",
> +       "mem-fatal",
> +     ]
> +  }
> +]
> +
> +----
> +
> +CXL protocol (CXL.cache/mem) error injection requires the platform to support
> +ACPI v6.5+ error injection (EINJ). In addition to platform support, the
> +CONFIG_ACPI_APEI_EINJ and CONFIG_ACPI_APEI_EINJ_CXL kernel configuration 
> options
> +will need to be enabled. For more information, view the Linux kernel 
> documentation
> +on EINJ. Example using the bus output above:
> +
> +----
> +
> +# cxl list -TP
> + [
> +  {
> +    "port":"port1",
> +    "host":"pci0000:e0",
> +    "depth":1,
> +    "decoders_committed":1,
> +    "nr_dports":1,
> +    "dports":[
> +      {
> +        "dport":"0000:e0:01.1",
> +        "alias":"device:02",
> +        "id":0,
> +        "protocol_injectable":true
> +      }
> +    ]
> +  }
> +]
> +
> +# cxl inject-error "0000:e0:01.1" -t mem-correctable
> +cxl inject-error: inject_proto_err: injected mem-correctable protocol error.
> +
> +----
> +
> +Device poison can only by used with CXL memory devices. A device physical 
> address
> +(DPA) is required to do poison injection. DPAs range from 0 to the size of
> +device's memory, which can be found using 'cxl-list'. An example injection:
> +
> +----
> +
> +# cxl inject-error mem0 -t poison -a 0x1000
> +poison injected at mem0:0x1000
> +# cxl list -m mem0 -u --media-errors
> +{
> +  "memdev":"mem0",
> +  "ram_size":"256.00 MiB (268.44 MB)",
> +  "serial":"0",
> +  "host":"0000:0d:00.0",
> +  "firmware_version":"BWFW VERSION 00",
> +  "media_errors":[
> +    {
> +      "offset":"0x1000",
> +      "length":64,
> +      "source":"Injected"
> +    }
> +  ]
> +}
> +
> +----

It feels to me like the two injection 'modes' should really be two
separate commands, especially since they act on different classes of
targets.

So essentially, split both the injection and clear commands into:

inject-protocol-error
inject-media-error
clear-protocol-error
clear-media-error.

That way the target operands for them are well defined - i.e. port
objects for protocol errors and memdevs for media errors.


Another thing - and I'm not too attached to either way for this -

The -t 'long-string' feels a bit awkward. Could it be split into
something like:

  --target={mem,cache} --type={correctable,uncorrectable,fatal}

And then 'compose' the actual thing being injected from those options?
Or is that unnecessary gymnastics?

Re: [PATCH 7/7] Documentation: Add docs for inject/clear-error commands

Reply via email to