On Sun, Sep 13, 2020 at 08:02:03PM +0800, Xianting Tian wrote:
> We can't get bmc's device id with low probability when loading ipmi driver,
> it caused bmc device register failed. This issue may caused by bad lpc
> signal quality. When this issue happened, we got below kernel printks:
>       [Wed Sep  9 19:52:03 2020] ipmi_si IPI0001:00: IPMI message handler: 
> device id demangle failed: -22
>       [Wed Sep  9 19:52:03 2020] IPMI BT: using default values
>       [Wed Sep  9 19:52:03 2020] IPMI BT: req2rsp=5 secs retries=2
>       [Wed Sep  9 19:52:03 2020] ipmi_si IPI0001:00: Unable to get the device 
> id: -5
>       [Wed Sep  9 19:52:04 2020] ipmi_si IPI0001:00: Unable to register 
> device: error -5
> 
> When this issue happened, we want to manually unload the driver and try to
> load it again, but it can't be unloaded by 'rmmod' as it is already 'in use'.

I'm not sure this patch is a good idea; it would cause a long boot delay
in situations where there really isn't a BMC out there.  Yes, it
happens.

You don't have to reload the driver to add a device, though.  You can
hot-add devices using /sys/modules/ipmi_si/parameters/hotmod.  Look in
Documentation/driver-api/ipmi.rst for details.

Does that work for you?

-corey

> 
> We add below 'printk' in handle_one_recv_msg(), when this issue happened,
> the msg we received is "Recv: 1c 01 d5", which means the data_len is 1,
> data[0] is 0xd5.
>       Debug code:
>       static int handle_one_recv_msg(struct ipmi_smi *intf,
>                                struct ipmi_smi_msg *msg) {
>               printk("Recv: %*ph\n", msg->rsp_size, msg->rsp);
>               ... ...
>       }
> Then in ipmi_demangle_device_id(), it returned '-EINVAL' as 'data_len < 7'
> and 'data[0] != 0'.
> 
> We used this patch to retry to get device id when error happen, we
> reproduced this issue again and the retry succeed on the first retry, we
> finally got the correct msg and then all is ok:
> Recv: 1c 01 00 01 81 05 84 02 af db 07 00 01 00 b9 00 10 00
> 
> So use retry machanism in this patch to give bmc more opportunity to
> correctly response kernel.
> 
> Signed-off-by: Xianting Tian <tian.xiant...@h3c.com>
> ---
>  drivers/char/ipmi/ipmi_msghandler.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/char/ipmi/ipmi_msghandler.c 
> b/drivers/char/ipmi/ipmi_msghandler.c
> index 737c0b6b2..bfb2de77a 100644
> --- a/drivers/char/ipmi/ipmi_msghandler.c
> +++ b/drivers/char/ipmi/ipmi_msghandler.c
> @@ -34,6 +34,7 @@
>  #include <linux/uuid.h>
>  #include <linux/nospec.h>
>  #include <linux/vmalloc.h>
> +#include <linux/delay.h>
>  
>  #define IPMI_DRIVER_VERSION "39.2"
>  
> @@ -60,6 +61,9 @@ enum ipmi_panic_event_op {
>  #else
>  #define IPMI_PANIC_DEFAULT IPMI_SEND_PANIC_EVENT_NONE
>  #endif
> +
> +#define GET_DEVICE_ID_MAX_RETRY      5
> +
>  static enum ipmi_panic_event_op ipmi_send_panic_event = IPMI_PANIC_DEFAULT;
>  
>  static int panic_op_write_handler(const char *val,
> @@ -2426,19 +2430,26 @@ send_get_device_id_cmd(struct ipmi_smi *intf)
>  static int __get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc)
>  {
>       int rv;
> -
> -     bmc->dyn_id_set = 2;
> +     unsigned int retry_count = 0;
>  
>       intf->null_user_handler = bmc_device_id_handler;
>  
> +retry:
> +     bmc->dyn_id_set = 2;
> +
>       rv = send_get_device_id_cmd(intf);
>       if (rv)
>               return rv;
>  
>       wait_event(intf->waitq, bmc->dyn_id_set != 2);
>  
> -     if (!bmc->dyn_id_set)
> +     if (!bmc->dyn_id_set) {
> +             msleep(1000);
> +             if (++retry_count <= GET_DEVICE_ID_MAX_RETRY)
> +                     goto retry;
> +
>               rv = -EIO; /* Something went wrong in the fetch. */
> +     }
>  
>       /* dyn_id_set makes the id data available. */
>       smp_rmb();
> -- 
> 2.17.1
> 


_______________________________________________
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to