On Mon, Jun 02, 2025 at 03:56:26PM -0500, Ben Cheatham wrote: > v2 Changes: > - Make the --clear option of 'inject-error' its own command (Alison) > - Debugfs is now found using the /proc/mount entry instead of > providing the path using a --debugfs option > - Man page added for 'clear-error' > - Reword commit descriptions for clarity > > This series adds support for injecting CXL protocol (CXL.cache/mem) > errors[1] into CXL RCH Downstream ports and VH root ports[2] and > poison into CXL memory devices through the CXL debugfs. Errors are > injected using a new 'inject-error' command, while errors are reported > using a new cxl-list "-N"/"--injectable-errors" option. Device poison > can be cleared using the 'clear-error' command. > > The 'inject-error'/'clear-error' commands and "-N" option of cxl-list all > require access to the CXL driver's debugfs. > > The documentation for the new cxl-inject-error command shows both usage > and the possible device/error types, as well as how to retrieve them > using cxl-list. The documentation for cxl-list has also been updated to > show the usage of the new injectable errors option. > > [1]: ACPI v6.5 spec, section 18.6.4 > [2]: ACPI v6.5 spec, table 18.31 > > -- > > Alison, I reached out to Junhyeok about his poison injection series but > never heard back, so I've just continued with my original plans for a > v2. > > Quick note: My testing setup is screwed up at the moment, so this > revision is untested. I'll try to get it fixed for the next revision.
I applied this to v82 (needs a sync up in libcxl.sym) and ran cxl-poison unit test using your new cxl-cli cmds instead of writing to debugfs directly.[1] Works for me. Just thought I'd share that as proof of life until I review it completely. Adding more test cases to cxl-poison.sh makes sense for the device poison. Wondering about the protocol errors. How do we test those? [1] diff --git a/test/cxl-poison.sh b/test/cxl-poison.sh index 6ed890bc666c..41ab670b1094 100644 --- a/test/cxl-poison.sh +++ b/test/cxl-poison.sh @@ -68,7 +68,8 @@ inject_poison_sysfs() memdev="$1" addr="$2" - echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/inject_poison +# echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/inject_poison + $CXL inject-error "$memdev" -t poison -a "$addr" } clear_poison_sysfs() @@ -76,7 +77,8 @@ clear_poison_sysfs() memdev="$1" addr="$2" - echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/clear_poison +# echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/clear_poison + $CXL clear-error "$memdev" -a "$addr" } While applying this: Documentation: Add docs for inject/clear-error commands Got these whitespace complaints: 234: new blank line at EOF 158: space before tab in indent. "offset":"0x1000", 159: space before tab in indent. "length":64, 160: space before tab in indent. "source":"Injected" -- snip