On Mon, Mar 18, 2024 at 02:01:38PM -0700, Dan Williams wrote: > Alison Schofield wrote: > > On Mon, Mar 18, 2024 at 10:51:13AM -0700, fan wrote: > > > On Wed, Mar 13, 2024 at 09:05:17PM -0700, alison.schofi...@intel.com > > > wrote: > > > > From: Alison Schofield <alison.schofi...@intel.com> > > > > > > > > CXL devices maintain a list of locations that are poisoned or result > > > > in poison if the addresses are accessed by the host. > > > > > > > > Per the spec (CXL 3.1 8.2.9.9.4.1), the device returns the Poison > > > > List as a set of Media Error Records that include the source of the > > > > error, the starting device physical address and length. > > > > > > > > Trigger the retrieval of the poison list by writing to the memory > > > > device sysfs attribute: trigger_poison_list. The CXL driver only > > > > offers triggering per memdev, so the trigger by region interface > > > > offered here is a convenience API that triggers a poison list > > > > retrieval for each memdev contributing to a region. > > > > > > > > int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev); > > > > int cxl_region_trigger_poison_list(struct cxl_region *region); > > > > > > > > The resulting poison records are logged as kernel trace events > > > > named 'cxl_poison'. > > > > > > > > Signed-off-by: Alison Schofield <alison.schofi...@intel.com> > > > > Reviewed-by: Dave Jiang <dave.ji...@intel.com> > > > > --- > > > > cxl/lib/libcxl.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++ > > > > cxl/lib/libcxl.sym | 2 ++ > > > > cxl/libcxl.h | 2 ++ > > > > 3 files changed, 51 insertions(+) > > > > > > > > diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c > > > > index ff27cdf7c44a..73db8f15c704 100644 > > > > --- a/cxl/lib/libcxl.c > > > > +++ b/cxl/lib/libcxl.c > > > > @@ -1761,6 +1761,53 @@ CXL_EXPORT int > > > > cxl_memdev_disable_invalidate(struct cxl_memdev *memdev) > > > > return 0; > > > > } > > > > > > > > +CXL_EXPORT int cxl_memdev_trigger_poison_list(struct cxl_memdev > > > > *memdev) > > > > +{ > > > > + struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev); > > > > + char *path = memdev->dev_buf; > > > > + int len = memdev->buf_len, rc; > > > > + > > > > + if (snprintf(path, len, "%s/trigger_poison_list", > > > > + memdev->dev_path) >= len) { > > > > + err(ctx, "%s: buffer too small\n", > > > > + cxl_memdev_get_devname(memdev)); > > > > + return -ENXIO; > > > > + } > > > > + rc = sysfs_write_attr(ctx, path, "1\n"); > > > > + if (rc < 0) { > > > > + fprintf(stderr, > > > > + "%s: Failed write sysfs attr > > > > trigger_poison_list\n", > > > > + cxl_memdev_get_devname(memdev)); > > > > > > Should we use err() instead of fprintf here? > > > > Thanks Fan, > > > > How about this? > > > > - use fprintf if access() fails, ie device doesn't support poison list, > > - use err() for failure to actually read the poison list on a device with > > support > > Why? There is no raw usage of fprintf in any of the libraries (ndctl, > daxctl, cxl) to date. If someone builds the library without logging then > it should not chat on stderr at all, and if someone redirects logging to > syslog then it also should emit messages only there and not stderr.
Why indeed :( I'll remove the fprintf() and only use err() for both cases: device doesn't support feature, or failure to read list.