Add man pages for the 'cxl-inject-error' and 'cxl-clear-error' commands. These man pages show usage and examples for each of their use cases.
Signed-off-by: Ben Cheatham <[email protected]> --- Documentation/cxl/cxl-clear-error.txt | 69 +++++++++++ Documentation/cxl/cxl-inject-error.txt | 161 +++++++++++++++++++++++++ Documentation/cxl/meson.build | 2 + 3 files changed, 232 insertions(+) create mode 100644 Documentation/cxl/cxl-clear-error.txt create mode 100644 Documentation/cxl/cxl-inject-error.txt diff --git a/Documentation/cxl/cxl-clear-error.txt b/Documentation/cxl/cxl-clear-error.txt new file mode 100644 index 0000000..9d77855 --- /dev/null +++ b/Documentation/cxl/cxl-clear-error.txt @@ -0,0 +1,69 @@ +// SPDX-License-Identifier: GPL-2.0 + +cxl-clear-error(1) +================== + +NAME +---- +cxl-clear-error - Clear CXL errors from CXL devices + +SYNOPSIS +-------- +[verse] +'cxl clear-error' <device name> [<options>] + +Clear an error from a CXL device. The types of devices supported are: + +"memdevs":: A CXL memory device. Memory devices are specified by device +name ("mem0"), device id ("0") and/or host device name ("0000:35:00.0"). + +Only device poison (viewable using the '-L'/'--media-errors' option of +'cxl-list') can be cleared from a device using this command. For example: + +---- + +# cxl list -m mem0 -L -u +{ + "memdev":"mem0", + "ram_size":"1024.00 MiB (1073.74 MB)", + "ram_qos_class":42, + "serial":"0x0", + "numa_node:1, + "host":"0000:35:00.0", + "media_errors":[ + { + "offset":"0x1000", + "length":64, + "source":"Injected" + } + ] +} + +# cxl clear-error mem0 -a 0x1000 +poison cleared at mem0:0x1000 + +# cxl list -m mem0 -L -u +{ + "memdev":"mem0", + "ram_size":"1024.00 MiB (1073.74 MB)", + "ram_qos_class":42, + "serial":"0x0", + "numa_node:1, + "host":"0000:35:00.0", + "media_errors":[ + ] +} + +---- + +This command depends on the kernel debug filesystem (debugfs) to clear device poison. + +OPTIONS +------- +-a:: +--address:: + Device physical address (DPA) to clear poison from. Address can be specified + in hex or decimal. Required for clearing poison. + +--debug:: + Enable debug output diff --git a/Documentation/cxl/cxl-inject-error.txt b/Documentation/cxl/cxl-inject-error.txt new file mode 100644 index 0000000..80d03be --- /dev/null +++ b/Documentation/cxl/cxl-inject-error.txt @@ -0,0 +1,161 @@ +// SPDX-License-Identifier: GPL-2.0 + +cxl-inject-error(1) +=================== + +NAME +---- +cxl-inject-error - Inject CXL errors into CXL devices + +SYNOPSIS +-------- +[verse] +'cxl inject-error' <device name> [<options>] + +WARNING: Error injection can cause system instability and should only be used +for debugging hardware and software error recovery flows. Use at your own risk! + +Inject an error into a CXL device. The type of errors supported depend on the +device specified. The types of devices supported are: + +"Downstream Ports":: A CXL RCH downstream port (dport) or a CXL VH root port. +Eligible ports will have their 'protocol_injectable' attribute in 'cxl-list' +set to true. Dports are specified by host name ("0000:0e:01.1"). +"memdevs":: A CXL memory device. Memory devices are specified by device name +("mem0"), device id ("0"), and/or host device name ("0000:35:00.0"). + +There are two types of errors which can be injected: CXL protocol errors +and device poison. + +CXL protocol errors can only be used with downstream ports (as defined above). +Protocol errors follow the format of "<protocol>-<severity>". For example, +a "mem-fatal" error is a CXL.mem fatal protocol error. Protocol errors can be +found in the "injectable_protocol_errors" list under a CXL bus object. This +list is only available when the CXL debugfs is accessible (normally mounted +at "/sys/kernel/debug/cxl"). For example: + +---- + +# cxl list -B +[ + { + "bus":"root0", + "provider":"ACPI.CXL", + "injectable_protocol_errors":[ + "mem-correctable", + "mem-fatal", + ] + } +] + +---- + +CXL protocol (CXL.cache/mem) error injection requires the platform to support +ACPI v6.5+ error injection (EINJ). In addition to platform support, the +CONFIG_ACPI_APEI_EINJ and CONFIG_ACPI_APEI_EINJ_CXL kernel configuration options +will need to be enabled. For more information, view the Linux kernel documentation +on EINJ. Example using the bus output above: + +---- + +# cxl list -TP + [ + { + "port":"port1", + "host":"pci0000:e0", + "depth":1, + "decoders_committed":1, + "nr_dports":1, + "dports":[ + { + "dport":"0000:e0:01.1", + "alias":"device:02", + "id":0, + "protocol_injectable":true + } + ] + } +] + +# cxl inject-error "0000:e0:01.1" -t mem-correctable +cxl inject-error: inject_proto_err: injected mem-correctable protocol error. + +---- + +Device poison can only by used with CXL memory devices. A device physical address +(DPA) is required to do poison injection. DPAs range from 0 to the size of +device's memory, which can be found using 'cxl-list'. An example injection: + +---- + +# cxl inject-error mem0 -t poison -a 0x1000 +poison injected at mem0:0x1000 +# cxl list -m mem0 -u --media-errors +{ + "memdev":"mem0", + "ram_size":"256.00 MiB (268.44 MB)", + "serial":"0", + "host":"0000:0d:00.0", + "firmware_version":"BWFW VERSION 00", + "media_errors":[ + { + "offset":"0x1000", + "length":64, + "source":"Injected" + } + ] +} + +---- + +Not all memory devices support poison injection. To see if a device supports +poison injection through debugfs, use 'cxl-list' look for the "poison-injectable" +attribute under the device. This attribute is only available when the CXL debugfs +is accessible. Example: + +---- + +# cxl list -u -m mem0 +{ + "memdev":"mem0", + "ram_size":"256.00 MiB (268.44 MB)", + "serial":"0", + "host":"0000:0d:00.0", + "firmware_version":"BWFW VERSION 00", + "poison_injectable":true +} + +---- + +This command depends on the kernel debug filesystem (debugfs) to do CXL protocol +error and device poison injection. + +OPTIONS +------- +-a:: +--address:: + Device physical address (DPA) to use for poison injection. Address can + be specified in hex or decimal. Required for poison injection. + +-t:: +--type:: + Type of error to inject into <device name>. The type of error is restricted + by device type. The following shows the possible types under their associated + device type(s): +---- + +Downstream Ports: :: + cache-correctable, cache-uncorrectable, cache-fatal, mem-correctable, + mem-uncorrectable, mem-fatal + +Memdevs: :: + poison + +---- + +--debug:: + Enable debug output + +SEE ALSO +-------- +linkcxl:cxl-list[1] diff --git a/Documentation/cxl/meson.build b/Documentation/cxl/meson.build index 8085c1c..0b75eed 100644 --- a/Documentation/cxl/meson.build +++ b/Documentation/cxl/meson.build @@ -50,6 +50,8 @@ cxl_manpages = [ 'cxl-update-firmware.txt', 'cxl-set-alert-config.txt', 'cxl-wait-sanitize.txt', + 'cxl-inject-error.txt', + 'cxl-clear-error.txt', ] foreach man : cxl_manpages -- 2.52.0
