> > Thanks for the bug report and debugging info.  I think I know what is
> > going on, I've attached a patch that should hopefully fix it.
> > Basically, it looks like the BMC is alive enough that it sort of
> > responds to the host, but not alive enough to actually complete a
> > transaction.  The driver needs to not immediately retry in that case, it
> > needs to delay a bit.
> >
> > It passes all my tests, but the situation you are in would be hard to
> > manufacture for me.
> >
> > Can you try this patch?
>
> Thanks for the super quick response, I'll try out this patch and report
back my findings.
>
> Best regards
> Mark

The patch looks good.  Without the patch I was able to reproduce the
problem on kernels 6.6 and 6.12 (but not 6.1) after 5-20 attempts of
running 'ipmitool mc reset cold' every 2 minutes.  With the patch, I have
run it 50 times without incident.  The hosed counter isn't as much of an
indicator as I thought, I saw it in the tens of thousands with and without
the patch, I have also seen it in the hundreds of thousands without the
patch and on other hardware I have seen it reach 5 million in one hour
without the patch (but also without incident).

We will incorporate your patch into our builds so that we avoid hitting
this problem in production again.

Best regards
Mark
_______________________________________________
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to