From: Alison Schofield <[email protected]>
The RFC label is because this is built upon in flight patchsets
making it unlikely others can try it out. It depends upon the
tracing support in Dave's monitor patchset [1], and the kernel
driver support for poison in this patchset [2].
The first patch adds a libcxl API for triggering the read of a
poison list from a memory device. Users of that API will need to
trace the kernel events to collect the error records.
Patches 2 & 3 offer a pretty option, --media-errors to cxl list
where the the poison list is read, results collected and parsed,
and the media error records included in the JSON list output.
The JSON output of 'cxl list' does not include all the same fields
that are available in the 'cxl_poison' trace event.
Trace events of 'cxl_poison' always include these fields:
region: memdev: pcidev: hpa: dpa: length: source: flags: overflow_time:
'cxl list --media-errors' omits fields that seem useless in the
context of the cxl list command:
- Do not repeat the memdev, region, or pcidev's that are
already included in the list output.
- Only include 'hpa' when media errors are listed by region.
Examples:
cxl list -m mem2 --media-errors
[
{
"memdev":"mem2",
"pmem_size":1073741824,
"ram_size":0,
"serial":2,
"host":"cxl_mem.2",
"media_errors":{
"nr media-errors":2,
"media-error records":[
{
"dpa":64,
"length":128,
"source":"Injected",
"flags":"Overflow,",
"overflow_time":1656711046
},
{
"dpa":192,
"length":192,
"source":"Internal",
"flags":"Overflow,",
"overflow_time":1656711046
},
]
}
}
]
# cxl list -r region5 --media-errors
[
{
"region":"region5",
"resource":1035623989248,
"size":2147483648,
"interleave_ways":2,
"interleave_granularity":4096,
"decode_state":"commit",
"media_errors":{
"nr media-errors":2,
"media-error records":[
{
"memdev":"mem2",
"hpa":0,
"dpa":0,
"length":64,
"source":"Reserved",
"flags":"",
"overflow_time":0
},
{
"memdev":"mem5",
"hpa":0,
"dpa":384,
"length":256,
"source":"Injected",
"flags":"",
"overflow_time":0
}
]
}
}
]
[1]
https://lore.kernel.org/nvdimm/166363103019.3861186.3067220004819656109.st...@djiang5-desk3.ch.intel.com/
[2]
https://lore.kernel.org/linux-cxl/[email protected]/
Alison Schofield (3):
libcxl: add interfaces for GET_POISON_LIST mailbox commands
cxl/list: collect and parse the poison list records
cxl/list: add --media-errors option to cxl list
Documentation/cxl/cxl-list.txt | 66 +++++++++++
cxl/filter.c | 2 +
cxl/filter.h | 1 +
cxl/json.c | 197 +++++++++++++++++++++++++++++++++
cxl/lib/libcxl.c | 40 +++++++
cxl/lib/libcxl.sym | 6 +
cxl/libcxl.h | 2 +
cxl/list.c | 2 +
8 files changed, 316 insertions(+)
--
2.37.3