On 02/15/2014 09:44 PM, Corey Minyard wrote: > I'm copying Rocky Craig, who wrote this state machine.
Sorry for the slow response. >> I fould a problem: when a cmd timeout and just >> in that time bt->seq < 2, system will alway keep >> retrying and we can't send any cmd to bmc. >> >> the error message is like this: >> [ 530.908621] IPMI BT: timeout in RD_WAIT [ ] 1 retries left >> [ 582.661329] IPMI BT: timeout in RD_WAIT [ ] Exactly what are you using for a BMC? I need to see the trace before this. Evidently a command was sent that did not complete on the BMC side. As Corey suggested, the BMC is the prime suspect at the moment. The IPMI driver is waiting for the BMC to do the next right thing. I wrote the state machine flow (and state names) trying to follow the IPMI spec very closely. I submit that starting from the code is the wrong place to start. First insure you cleanly understand the spec, then the code should read much easier. This methodology has helped out over many years and many systems facing similar challenges. Regards, Rocky >> [ 582.661334] failed 2 retries, sending error response >> [ 582.661337] IPMI: BT reset (takes 5 secs) >> [ 693.335307] IPMI BT: timeout in RD_WAIT [ ] >> [ 693.335312] failed 2 retries, sending error response >> [ 693.335315] IPMI: BT reset (takes 5 secs) >> [ 804.825161] IPMI BT: timeout in RD_WAIT [ ] >> [ 804.825166] failed 2 retries, sending error response >> [ 804.825169] IPMI: BT reset (takes 5 secs) >> ... >> >> When BT reset, a cmd "warm reset" will be sent to bmc, but this cmd >> is Optional in spec(refer to ipmi-interface-spec-v2). Some machines >> don't support this cmd. >> >> So, bt->init is introduced. Only during insmod, we do BT reset when >> response timeout to avoid system crash. >> >> Reported-by: Hu Shiyuan <[email protected]> >> Signed-off-by: Xie XiuQi <[email protected]> >> Cc: [email protected] # 3.4+ >> --- >> drivers/char/ipmi/ipmi_bt_sm.c | 9 +++++++-- >> 1 file changed, 7 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/char/ipmi/ipmi_bt_sm.c b/drivers/char/ipmi/ipmi_bt_sm.c >> index a22a7a5..b4a7b2a 100644 >> --- a/drivers/char/ipmi/ipmi_bt_sm.c >> +++ b/drivers/char/ipmi/ipmi_bt_sm.c >> @@ -107,6 +107,7 @@ struct si_sm_data { >> int BT_CAP_outreqs; >> long BT_CAP_req2rsp; >> int BT_CAP_retries; /* Recommended retries */ >> + int init; >> }; >> >> #define BT_CLR_WR_PTR 0x01 /* See IPMI 1.5 table 11.6.4 */ >> @@ -438,8 +439,8 @@ static enum si_sm_result error_recovery(struct >> si_sm_data *bt, >> if (!bt->nonzero_status) >> printk(KERN_ERR "IPMI BT: stuck, try power cycle\n"); >> >> - /* this is most likely during insmod */ >> - else if (bt->seq <= (unsigned char)(bt->BT_CAP_retries & 0xFF)) { >> + /* only during insmod */ >> + else if (!bt->init) { >> printk(KERN_WARNING "IPMI: BT reset (takes 5 secs)\n"); >> bt->state = BT_STATE_RESET1; >> return SI_SM_CALL_WITHOUT_DELAY; >> @@ -589,6 +590,10 @@ static enum si_sm_result bt_event(struct si_sm_data >> *bt, long time) >> BT_STATE_CHANGE(BT_STATE_READ_WAIT, >> SI_SM_CALL_WITHOUT_DELAY); >> bt->state = bt->complete; >> + >> + if (!bt->init && bt->seq) >> + bt->init = 1; >> + >> return bt->state == BT_STATE_IDLE ? /* where to next? */ >> SI_SM_TRANSACTION_COMPLETE : /* normal */ >> SI_SM_CALL_WITHOUT_DELAY; /* Startup magic */ ------------------------------------------------------------------------------ Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk _______________________________________________ Openipmi-developer mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/openipmi-developer
