Re: [RFC v2 5/5] drm/xe/RAS: send multicast event on occurrence of an error
On 11/12/23 20:58, Tomer Tayar wrote: > On 10/11/2023 14:27, Tomer Tayar wrote: >> On 20/10/2023 18:58, Aravind Iddamsetty wrote: >>> Whenever a correctable or an uncorrectable error happens an event is sent >>> to the corresponding listeners of these groups. >>> >>> v2: Rebase >>> >>> Signed-off-by: Aravind Iddamsetty >>> --- >>>drivers/gpu/drm/xe/xe_hw_error.c | 33 >>>1 file changed, 33 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/xe/xe_hw_error.c >>> b/drivers/gpu/drm/xe/xe_hw_error.c >>> index bab6d4cf0b69..b0befb5e01cb 100644 >>> --- a/drivers/gpu/drm/xe/xe_hw_error.c >>> +++ b/drivers/gpu/drm/xe/xe_hw_error.c >>> @@ -786,6 +786,37 @@ xe_soc_hw_error_handler(struct xe_tile *tile, const >>> enum hardware_error hw_err) >>> (HARDWARE_ERROR_MAX << 1) + 1); >>>} >>> >>> +static void >>> +generate_netlink_event(struct xe_device *xe, const enum hardware_error >>> hw_err) >>> +{ >>> + struct sk_buff *msg; >>> + void *hdr; >>> + >>> + if (!xe->drm.drm_genl_family.module) >>> + return; >>> + >>> + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC); >>> + if (!msg) { >>> + drm_dbg_driver(>drm, "couldn't allocate memory for error >>> multicast event\n"); >>> + return; >>> + } >>> + >>> + hdr = genlmsg_put(msg, 0, 0, >drm.drm_genl_family, 0, >>> DRM_RAS_CMD_ERROR_EVENT); >>> + if (!hdr) { >>> + drm_dbg_driver(>drm, "mutlicast msg buffer is small\n"); >>> + nlmsg_free(msg); >>> + return; >>> + } >>> + >>> + genlmsg_end(msg, hdr); >>> + >>> + genlmsg_multicast(>drm.drm_genl_family, msg, 0, >>> + hw_err ? >>> + DRM_GENL_MCAST_UNCORR_ERR >>> + : DRM_GENL_MCAST_CORR_ERR, >>> + GFP_ATOMIC); >> I agree that hiding/wrapping any netlink/genetlink API/macro with a DRM >> helper would be sometimes redundant, >> and that in some cases the specific DRM driver would have to "dirt its >> hands" and deal with netlink (e.g. fill_error_details() in patch #3). >> However maybe here a DRM helper would have been useful, so we won't see >> a copy of this sequence in other DRM drivers? >> >> Thanks, >> Tomer > After rethinking, it is possible that different DRM drivers will need > some flexibility when it comes to calling genlmsg_put(), as they might > want to have more of this call in order to attach some data related to > the error indication. > In that case, adding a DRM function that wraps it may me redundant. > What do you think? I think we can expose this base level call to every drm driver and if it wants to add any custom msg would define it own helper that should be ok i believe. Thanks, Aravind. > >>> +} >>> + >>>static void >>>xe_hw_error_source_handler(struct xe_tile *tile, const enum >>> hardware_error hw_err) >>>{ >>> @@ -849,6 +880,8 @@ xe_hw_error_source_handler(struct xe_tile *tile, const >>> enum hardware_error hw_er >>> } >>> >>> xe_mmio_write32(gt, DEV_ERR_STAT_REG(hw_err), errsrc); >>> + >>> + generate_netlink_event(tile_to_xe(tile), hw_err); >>>unlock: >>> spin_unlock_irqrestore(_to_xe(tile)->irq.lock, flags); >>>}
Re: [RFC v2 5/5] drm/xe/RAS: send multicast event on occurrence of an error
On 10/11/2023 14:27, Tomer Tayar wrote: > On 20/10/2023 18:58, Aravind Iddamsetty wrote: >> Whenever a correctable or an uncorrectable error happens an event is sent >> to the corresponding listeners of these groups. >> >> v2: Rebase >> >> Signed-off-by: Aravind Iddamsetty >> --- >>drivers/gpu/drm/xe/xe_hw_error.c | 33 >>1 file changed, 33 insertions(+) >> >> diff --git a/drivers/gpu/drm/xe/xe_hw_error.c >> b/drivers/gpu/drm/xe/xe_hw_error.c >> index bab6d4cf0b69..b0befb5e01cb 100644 >> --- a/drivers/gpu/drm/xe/xe_hw_error.c >> +++ b/drivers/gpu/drm/xe/xe_hw_error.c >> @@ -786,6 +786,37 @@ xe_soc_hw_error_handler(struct xe_tile *tile, const >> enum hardware_error hw_err) >> (HARDWARE_ERROR_MAX << 1) + 1); >>} >> >> +static void >> +generate_netlink_event(struct xe_device *xe, const enum hardware_error >> hw_err) >> +{ >> +struct sk_buff *msg; >> +void *hdr; >> + >> +if (!xe->drm.drm_genl_family.module) >> +return; >> + >> +msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC); >> +if (!msg) { >> +drm_dbg_driver(>drm, "couldn't allocate memory for error >> multicast event\n"); >> +return; >> +} >> + >> +hdr = genlmsg_put(msg, 0, 0, >drm.drm_genl_family, 0, >> DRM_RAS_CMD_ERROR_EVENT); >> +if (!hdr) { >> +drm_dbg_driver(>drm, "mutlicast msg buffer is small\n"); >> +nlmsg_free(msg); >> +return; >> +} >> + >> +genlmsg_end(msg, hdr); >> + >> +genlmsg_multicast(>drm.drm_genl_family, msg, 0, >> + hw_err ? >> + DRM_GENL_MCAST_UNCORR_ERR >> + : DRM_GENL_MCAST_CORR_ERR, >> + GFP_ATOMIC); > I agree that hiding/wrapping any netlink/genetlink API/macro with a DRM > helper would be sometimes redundant, > and that in some cases the specific DRM driver would have to "dirt its > hands" and deal with netlink (e.g. fill_error_details() in patch #3). > However maybe here a DRM helper would have been useful, so we won't see > a copy of this sequence in other DRM drivers? > > Thanks, > Tomer After rethinking, it is possible that different DRM drivers will need some flexibility when it comes to calling genlmsg_put(), as they might want to have more of this call in order to attach some data related to the error indication. In that case, adding a DRM function that wraps it may me redundant. What do you think? >> +} >> + >>static void >>xe_hw_error_source_handler(struct xe_tile *tile, const enum >> hardware_error hw_err) >>{ >> @@ -849,6 +880,8 @@ xe_hw_error_source_handler(struct xe_tile *tile, const >> enum hardware_error hw_er >> } >> >> xe_mmio_write32(gt, DEV_ERR_STAT_REG(hw_err), errsrc); >> + >> +generate_netlink_event(tile_to_xe(tile), hw_err); >>unlock: >> spin_unlock_irqrestore(_to_xe(tile)->irq.lock, flags); >>} >
Re: [RFC v2 5/5] drm/xe/RAS: send multicast event on occurrence of an error
On 20/10/2023 18:58, Aravind Iddamsetty wrote: > Whenever a correctable or an uncorrectable error happens an event is sent > to the corresponding listeners of these groups. > > v2: Rebase > > Signed-off-by: Aravind Iddamsetty > --- > drivers/gpu/drm/xe/xe_hw_error.c | 33 > 1 file changed, 33 insertions(+) > > diff --git a/drivers/gpu/drm/xe/xe_hw_error.c > b/drivers/gpu/drm/xe/xe_hw_error.c > index bab6d4cf0b69..b0befb5e01cb 100644 > --- a/drivers/gpu/drm/xe/xe_hw_error.c > +++ b/drivers/gpu/drm/xe/xe_hw_error.c > @@ -786,6 +786,37 @@ xe_soc_hw_error_handler(struct xe_tile *tile, const enum > hardware_error hw_err) > (HARDWARE_ERROR_MAX << 1) + 1); > } > > +static void > +generate_netlink_event(struct xe_device *xe, const enum hardware_error > hw_err) > +{ > + struct sk_buff *msg; > + void *hdr; > + > + if (!xe->drm.drm_genl_family.module) > + return; > + > + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC); > + if (!msg) { > + drm_dbg_driver(>drm, "couldn't allocate memory for error > multicast event\n"); > + return; > + } > + > + hdr = genlmsg_put(msg, 0, 0, >drm.drm_genl_family, 0, > DRM_RAS_CMD_ERROR_EVENT); > + if (!hdr) { > + drm_dbg_driver(>drm, "mutlicast msg buffer is small\n"); > + nlmsg_free(msg); > + return; > + } > + > + genlmsg_end(msg, hdr); > + > + genlmsg_multicast(>drm.drm_genl_family, msg, 0, > + hw_err ? > + DRM_GENL_MCAST_UNCORR_ERR > + : DRM_GENL_MCAST_CORR_ERR, > + GFP_ATOMIC); I agree that hiding/wrapping any netlink/genetlink API/macro with a DRM helper would be sometimes redundant, and that in some cases the specific DRM driver would have to "dirt its hands" and deal with netlink (e.g. fill_error_details() in patch #3). However maybe here a DRM helper would have been useful, so we won't see a copy of this sequence in other DRM drivers? Thanks, Tomer > +} > + > static void > xe_hw_error_source_handler(struct xe_tile *tile, const enum hardware_error > hw_err) > { > @@ -849,6 +880,8 @@ xe_hw_error_source_handler(struct xe_tile *tile, const > enum hardware_error hw_er > } > > xe_mmio_write32(gt, DEV_ERR_STAT_REG(hw_err), errsrc); > + > + generate_netlink_event(tile_to_xe(tile), hw_err); > unlock: > spin_unlock_irqrestore(_to_xe(tile)->irq.lock, flags); > }
RE: [RFC v2 5/5] drm/xe/RAS: send multicast event on occurrence of an error
>-Original Message- >From: Aravind Iddamsetty >Sent: Friday, October 20, 2023 11:59 AM >To: intel...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; >alexander.deuc...@amd.com; airl...@gmail.com; dan...@ffwll.ch; >joonas.lahti...@linux.intel.com; ogab...@kernel.org; Tayar, Tomer (Habana) >; hawking.zh...@amd.com; >harish.kasiviswanat...@amd.com; felix.kuehl...@amd.com; >luben.tui...@amd.com; Ruhl, Michael J >Subject: [RFC v2 5/5] drm/xe/RAS: send multicast event on occurrence of an >error > >Whenever a correctable or an uncorrectable error happens an event is sent >to the corresponding listeners of these groups. > >v2: Rebase Hi Aravind, This looks reasonable to me. Reviewed-by: Michael J. Ruhl M >Signed-off-by: Aravind Iddamsetty >--- > drivers/gpu/drm/xe/xe_hw_error.c | 33 > > 1 file changed, 33 insertions(+) > >diff --git a/drivers/gpu/drm/xe/xe_hw_error.c >b/drivers/gpu/drm/xe/xe_hw_error.c >index bab6d4cf0b69..b0befb5e01cb 100644 >--- a/drivers/gpu/drm/xe/xe_hw_error.c >+++ b/drivers/gpu/drm/xe/xe_hw_error.c >@@ -786,6 +786,37 @@ xe_soc_hw_error_handler(struct xe_tile *tile, const >enum hardware_error hw_err) > (HARDWARE_ERROR_MAX << 1) + 1); > } > >+static void >+generate_netlink_event(struct xe_device *xe, const enum hardware_error >hw_err) >+{ >+ struct sk_buff *msg; >+ void *hdr; >+ >+ if (!xe->drm.drm_genl_family.module) >+ return; >+ >+ msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC); >+ if (!msg) { >+ drm_dbg_driver(>drm, "couldn't allocate memory for error >multicast event\n"); >+ return; >+ } >+ >+ hdr = genlmsg_put(msg, 0, 0, >drm.drm_genl_family, 0, >DRM_RAS_CMD_ERROR_EVENT); >+ if (!hdr) { >+ drm_dbg_driver(>drm, "mutlicast msg buffer is small\n"); >+ nlmsg_free(msg); >+ return; >+ } >+ >+ genlmsg_end(msg, hdr); >+ >+ genlmsg_multicast(>drm.drm_genl_family, msg, 0, >+hw_err ? >+DRM_GENL_MCAST_UNCORR_ERR >+: DRM_GENL_MCAST_CORR_ERR, >+GFP_ATOMIC); >+} >+ > static void > xe_hw_error_source_handler(struct xe_tile *tile, const enum hardware_error >hw_err) > { >@@ -849,6 +880,8 @@ xe_hw_error_source_handler(struct xe_tile *tile, const >enum hardware_error hw_er > } > > xe_mmio_write32(gt, DEV_ERR_STAT_REG(hw_err), errsrc); >+ >+ generate_netlink_event(tile_to_xe(tile), hw_err); > unlock: > spin_unlock_irqrestore(_to_xe(tile)->irq.lock, flags); > } >-- >2.25.1