On Mon, Mar 18, 2024 at 10:51:13AM -0700, fan wrote:
> On Wed, Mar 13, 2024 at 09:05:17PM -0700, alison.schofi...@intel.com wrote:
> > From: Alison Schofield <alison.schofi...@intel.com>
> > 
> > CXL devices maintain a list of locations that are poisoned or result
> > in poison if the addresses are accessed by the host.
> > 
> > Per the spec (CXL 3.1 8.2.9.9.4.1), the device returns the Poison
> > List as a set of  Media Error Records that include the source of the
> > error, the starting device physical address and length.
> > 
> > Trigger the retrieval of the poison list by writing to the memory
> > device sysfs attribute: trigger_poison_list. The CXL driver only
> > offers triggering per memdev, so the trigger by region interface
> > offered here is a convenience API that triggers a poison list
> > retrieval for each memdev contributing to a region.
> > 
> > int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev);
> > int cxl_region_trigger_poison_list(struct cxl_region *region);
> > 
> > The resulting poison records are logged as kernel trace events
> > named 'cxl_poison'.
> > 
> > Signed-off-by: Alison Schofield <alison.schofi...@intel.com>
> > Reviewed-by: Dave Jiang <dave.ji...@intel.com>
> > ---
> >  cxl/lib/libcxl.c   | 47 ++++++++++++++++++++++++++++++++++++++++++++++
> >  cxl/lib/libcxl.sym |  2 ++
> >  cxl/libcxl.h       |  2 ++
> >  3 files changed, 51 insertions(+)
> > 
> > diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
> > index ff27cdf7c44a..73db8f15c704 100644
> > --- a/cxl/lib/libcxl.c
> > +++ b/cxl/lib/libcxl.c
> > @@ -1761,6 +1761,53 @@ CXL_EXPORT int cxl_memdev_disable_invalidate(struct 
> > cxl_memdev *memdev)
> >     return 0;
> >  }
> >  
> > +CXL_EXPORT int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev)
> > +{
> > +   struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev);
> > +   char *path = memdev->dev_buf;
> > +   int len = memdev->buf_len, rc;
> > +
> > +   if (snprintf(path, len, "%s/trigger_poison_list",
> > +                memdev->dev_path) >= len) {
> > +           err(ctx, "%s: buffer too small\n",
> > +               cxl_memdev_get_devname(memdev));
> > +           return -ENXIO;
> > +   }
> > +   rc = sysfs_write_attr(ctx, path, "1\n");
> > +   if (rc < 0) {
> > +           fprintf(stderr,
> > +                   "%s: Failed write sysfs attr trigger_poison_list\n",
> > +                   cxl_memdev_get_devname(memdev));
> 
> Should we use err() instead of fprintf here? 

Thanks Fan,

How about this?

- use fprintf if access() fails, ie device doesn't support poison list,
- use err() for failure to actually read the poison list on a device with
  support

Alison


> 
> Fan
> 
> > +           return rc;
> > +   }
> > +   return 0;
> > +}
> > +
> > +CXL_EXPORT int cxl_region_trigger_poison_list(struct cxl_region *region)
> > +{
> > +   struct cxl_memdev_mapping *mapping;
> > +   int rc;
> > +
> > +   cxl_mapping_foreach(region, mapping) {
> > +           struct cxl_decoder *decoder;
> > +           struct cxl_memdev *memdev;
> > +
> > +           decoder = cxl_mapping_get_decoder(mapping);
> > +           if (!decoder)
> > +                   continue;
> > +
> > +           memdev = cxl_decoder_get_memdev(decoder);
> > +           if (!memdev)
> > +                   continue;
> > +
> > +           rc = cxl_memdev_trigger_poison_list(memdev);
> > +           if (rc)
> > +                   return rc;
> > +   }
> > +
> > +   return 0;
> > +}
> > +
> >  CXL_EXPORT int cxl_memdev_enable(struct cxl_memdev *memdev)
> >  {
> >     struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev);
> > diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
> > index de2cd84b2960..3f709c60db3d 100644
> > --- a/cxl/lib/libcxl.sym
> > +++ b/cxl/lib/libcxl.sym
> > @@ -280,4 +280,6 @@ global:
> >     cxl_memdev_get_pmem_qos_class;
> >     cxl_memdev_get_ram_qos_class;
> >     cxl_region_qos_class_mismatch;
> > +   cxl_memdev_trigger_poison_list;
> > +   cxl_region_trigger_poison_list;
> >  } LIBCXL_6;
> > diff --git a/cxl/libcxl.h b/cxl/libcxl.h
> > index a6af3fb04693..29165043ca3f 100644
> > --- a/cxl/libcxl.h
> > +++ b/cxl/libcxl.h
> > @@ -467,6 +467,8 @@ enum cxl_setpartition_mode {
> >  
> >  int cxl_cmd_partition_set_mode(struct cxl_cmd *cmd,
> >             enum cxl_setpartition_mode mode);
> > +int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev);
> > +int cxl_region_trigger_poison_list(struct cxl_region *region);
> >  
> >  int cxl_cmd_alert_config_set_life_used_prog_warn_threshold(struct cxl_cmd 
> > *cmd,
> >                                                        int threshold);
> > -- 
> > 2.37.3
> > 

Reply via email to