On 09/05/2015 03:27 AM, Gouji, Masayuki wrote: > Hi, Corey > > I think the reason why timer stops is that timer system in IPMI is based on > only the request from outside. > > This is typical case > > 1) IPMI driver is in idle state. No pending messages , SI_timer is already > stopped. > 2) Some internal events happens and BMC raised interrupt. > 3) Driver send IPMI_GET_MSG_FLAGS_CMD from smi_event_handler via > smi_info->handlers->start_transaction > start_kcs_transaction(=handlers->start_transaction) never set timer
Ah then that's the bug. The attached patch should start the timer for internal messages as well as external messages. I can't believe I missed that. If you could test this, it would be great. Thanks, -corey > > From kcs , it doesn't matter who sent the message . > For working around the hang up , the timer is needed for those situasion > either. > Ipmi_timeout may be invoked in many cases.But it won't start si_timeout > because of the reason described below. > > > Some simular messages invoked internally , like > IPMI_READ_EVENT_MSG_BUFFER_CMD/ IPMI_GET_MSG_CMD and so on , may fall in the > same situation. > All are sent directly , using smi_info->handlers->start_transaction > > Watching where check_start_timer_thread is called from ,these are sender() > and set_need_watch(). > > > First, in check_start_timer_thread() , It has > > if (smi_info->si_state == SI_NORMAL && smi_info->curr_msg == NULL) > > We have situation where both conditions are not met . > When sending ie. IPMI_GET_MSG_FLAGS_CMD , smi_info->si_state > ==SI_GETTING_FLAGS and smi_info->curr_msg is occupied by > IPMI_GET_MSG_FLAGS_CMD. > > Simply , the condition for timer restart is complies for the messages sent > by APPL or other handlers. > > So, I think this code should be like this. > > > if ((smi_info->si_state == SI_NORMAL && smi_info->curr_msg == NULL ) > || smi_info->si_state == SI_GETTING_FLAGS > || smi_info->si_state == SI_GETTING_MESSAGES > ..... ) > > This is just showing the rough sketch for the fix. > > Second , > > Ipmi_timeout stops under some condition. > > - there is no event_waiter , which may happen eveb if rare. > - there is no pending messages after sending > - > In this condition, ipmi_timeout may be restarted by need_waiter(). > > This means ipmi_timeout can remain stopping while the internall messages are > being sent. > > This may require some fix too. > > > > If you think the way to let check_start_timer_thread() restart timer > correctly is better, > I'd think again according to this idea. > > By the way , my new patch still includes a BUG. > > I wanted to detect the firast attempt just after status switched to > KCS_WAIT_WRITE_START > and used kcs->write_count == 1 > But this for the last attempt. I have to find another way if I adhere to it. > Anyway this is wrong yet. > > > > Regards, > M.Gouji > > >> -----Original Message----- >> From: Corey Minyard [mailto:[email protected]] On Behalf Of Corey Minyard >> Sent: Saturday, September 05, 2015 4:04 AM >> To: Gouji, Masayuki/郷治 将之; [email protected] >> Subject: Re: [Openipmi-developer] [PATCH] ipmi:driver not to stop when the >> hardware not respond in irq/revised >> >> On 09/04/2015 04:56 AM, Gouji, Masayuki wrote: >>> Hi, Corey >>> >>> This is new one which includes the missed braces and function for modifying >> waiting period (default 0). >>> I think it complies to your comment. >> Well, not exactly. Thinking through this, if you are going to delay and >> spin, you might as well spin completely in the KCS code, spinning on the >> IBF value. That would be a lot simpler and it would return as soon as IBF >> was set. However, spinning is just not a good idea, especially in hardirq >> or softirq mode (where some of this would be happening). >> >> I've spent some time thinking about this and looking at the code, and I >> realized that the timer should be running whenever a message is being sent. >> Which is why no one has seen this before, and it means we are dealing with >> a different issue. If you look at the sender() function in ipmi_si_intf.c, >> it calls check_start_timer_thread(), which will start the timer and start >> the message send. The timer code checks at the end if it is idle, and if >> it is idle it doesn't restart the timer. >> >> So somehow the timer is not getting started or is getting canceled. >> That's the problem we need to fix. >> >> Looking at the code: >> >> static void check_start_timer_thread(struct smi_info *smi_info) { >> if (smi_info->si_state == SI_NORMAL && smi_info->curr_msg == NULL) { >> smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES); >> >> if (smi_info->thread) >> wake_up_process(smi_info->thread); >> >> start_next_msg(smi_info); >> smi_event_handler(smi_info, 0); >> } >> } >> >> perhaps start_next_msg() should come before smi_mod_timer(). Maybe >> there's a race where the timer goes off before start_next_msg()? Well, >> that shouldn't be possible because this code is called under the lock and >> the first the time timeout does is claim the lock. But something is going >> on here. >> >> -corey >>
>From cd3f670fe64686243fe6848d734c644dc2940fb1 Mon Sep 17 00:00:00 2001 From: Corey Minyard <[email protected]> Date: Sat, 5 Sep 2015 17:44:13 -0500 Subject: [PATCH 1/2] ipmi: Start the timer and thread on internal msgs The timer and thread were not being started for internal messages, so in interrupt mode if something hung the timer would never go off and clean things up. Factor out the internal message sending and start the timer for those messages, too. Signed-off-by: Corey Minyard <[email protected]> --- drivers/char/ipmi/ipmi_si_intf.c | 73 ++++++++++++++++++++++++---------------- 1 file changed, 44 insertions(+), 29 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 7dea62a..afb98b6 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -412,18 +412,42 @@ static enum si_sm_result start_next_msg(struct smi_info *smi_info) return rv; } -static void start_check_enables(struct smi_info *smi_info) +static void smi_mod_timer(struct smi_info *smi_info, unsigned long new_val) +{ + smi_info->last_timeout_jiffies = jiffies; + mod_timer(&smi_info->si_timer, new_val); + smi_info->timer_running = true; +} + +/* + * Start a new message and (re)start the timer and thread. + */ +static void start_new_msg(struct smi_info *smi_info, unsigned char *msg, + unsigned int size) +{ + smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES); + + if (smi_info->thread) + wake_up_process(smi_info->thread); + + smi_info->handlers->start_transaction(smi_info->si_sm, msg, size); +} + +static void start_check_enables(struct smi_info *smi_info, bool start_timer) { unsigned char msg[2]; msg[0] = (IPMI_NETFN_APP_REQUEST << 2); msg[1] = IPMI_GET_BMC_GLOBAL_ENABLES_CMD; - smi_info->handlers->start_transaction(smi_info->si_sm, msg, 2); + if (start_timer) + start_new_msg(smi_info, msg, 2); + else + smi_info->handlers->start_transaction(smi_info->si_sm, msg, 2); smi_info->si_state = SI_CHECKING_ENABLES; } -static void start_clear_flags(struct smi_info *smi_info) +static void start_clear_flags(struct smi_info *smi_info, bool start_timer) { unsigned char msg[3]; @@ -432,7 +456,10 @@ static void start_clear_flags(struct smi_info *smi_info) msg[1] = IPMI_CLEAR_MSG_FLAGS_CMD; msg[2] = WDT_PRE_TIMEOUT_INT; - smi_info->handlers->start_transaction(smi_info->si_sm, msg, 3); + if (start_timer) + start_new_msg(smi_info, msg, 3); + else + smi_info->handlers->start_transaction(smi_info->si_sm, msg, 3); smi_info->si_state = SI_CLEARING_FLAGS; } @@ -442,10 +469,8 @@ static void start_getting_msg_queue(struct smi_info *smi_info) smi_info->curr_msg->data[1] = IPMI_GET_MSG_CMD; smi_info->curr_msg->data_size = 2; - smi_info->handlers->start_transaction( - smi_info->si_sm, - smi_info->curr_msg->data, - smi_info->curr_msg->data_size); + start_new_msg(smi_info, smi_info->curr_msg->data, + smi_info->curr_msg->data_size); smi_info->si_state = SI_GETTING_MESSAGES; } @@ -455,20 +480,11 @@ static void start_getting_events(struct smi_info *smi_info) smi_info->curr_msg->data[1] = IPMI_READ_EVENT_MSG_BUFFER_CMD; smi_info->curr_msg->data_size = 2; - smi_info->handlers->start_transaction( - smi_info->si_sm, - smi_info->curr_msg->data, - smi_info->curr_msg->data_size); + start_new_msg(smi_info, smi_info->curr_msg->data, + smi_info->curr_msg->data_size); smi_info->si_state = SI_GETTING_EVENTS; } -static void smi_mod_timer(struct smi_info *smi_info, unsigned long new_val) -{ - smi_info->last_timeout_jiffies = jiffies; - mod_timer(&smi_info->si_timer, new_val); - smi_info->timer_running = true; -} - /* * When we have a situtaion where we run out of memory and cannot * allocate messages, we just leave them in the BMC and run the system @@ -478,11 +494,11 @@ static void smi_mod_timer(struct smi_info *smi_info, unsigned long new_val) * Note that we cannot just use disable_irq(), since the interrupt may * be shared. */ -static inline bool disable_si_irq(struct smi_info *smi_info) +static inline bool disable_si_irq(struct smi_info *smi_info, bool start_timer) { if ((smi_info->irq) && (!smi_info->interrupt_disabled)) { smi_info->interrupt_disabled = true; - start_check_enables(smi_info); + start_check_enables(smi_info, start_timer); return true; } return false; @@ -492,7 +508,7 @@ static inline bool enable_si_irq(struct smi_info *smi_info) { if ((smi_info->irq) && (smi_info->interrupt_disabled)) { smi_info->interrupt_disabled = false; - start_check_enables(smi_info); + start_check_enables(smi_info, true); return true; } return false; @@ -510,7 +526,7 @@ static struct ipmi_smi_msg *alloc_msg_handle_irq(struct smi_info *smi_info) msg = ipmi_alloc_smi_msg(); if (!msg) { - if (!disable_si_irq(smi_info)) + if (!disable_si_irq(smi_info, true)) smi_info->si_state = SI_NORMAL; } else if (enable_si_irq(smi_info)) { ipmi_free_smi_msg(msg); @@ -526,7 +542,7 @@ static void handle_flags(struct smi_info *smi_info) /* Watchdog pre-timeout */ smi_inc_stat(smi_info, watchdog_pretimeouts); - start_clear_flags(smi_info); + start_clear_flags(smi_info, true); smi_info->msg_flags &= ~WDT_PRE_TIMEOUT_INT; if (smi_info->intf) ipmi_smi_watchdog_pretimeout(smi_info->intf); @@ -879,8 +895,7 @@ static enum si_sm_result smi_event_handler(struct smi_info *smi_info, msg[0] = (IPMI_NETFN_APP_REQUEST << 2); msg[1] = IPMI_GET_MSG_FLAGS_CMD; - smi_info->handlers->start_transaction( - smi_info->si_sm, msg, 2); + start_new_msg(smi_info, msg, 2); smi_info->si_state = SI_GETTING_FLAGS; goto restart; } @@ -910,7 +925,7 @@ static enum si_sm_result smi_event_handler(struct smi_info *smi_info, * disable and messages disabled. */ if (smi_info->supports_event_msg_buff || smi_info->irq) { - start_check_enables(smi_info); + start_check_enables(smi_info, true); } else { smi_info->curr_msg = alloc_msg_handle_irq(smi_info); if (!smi_info->curr_msg) @@ -3632,7 +3647,7 @@ static int try_smi_init(struct smi_info *new_smi) * Start clearing the flags before we enable interrupts or the * timer to avoid racing with the timer. */ - start_clear_flags(new_smi); + start_clear_flags(new_smi, false); /* * IRQ is defined to be set when non-zero. req_events will @@ -3927,7 +3942,7 @@ static void cleanup_one_si(struct smi_info *to_clean) poll(to_clean); schedule_timeout_uninterruptible(1); } - disable_si_irq(to_clean); + disable_si_irq(to_clean, false); while (to_clean->curr_msg || (to_clean->si_state != SI_NORMAL)) { poll(to_clean); schedule_timeout_uninterruptible(1); -- 1.8.3.1
------------------------------------------------------------------------------
_______________________________________________ Openipmi-developer mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/openipmi-developer
