Hi,Corey 

Ok , We will test it.

But, please wait a little because this problem happens only on our specific 
products ,which we have a small numbers of.

We have to reserve that first amog our sections internally.

Regards,
M.Gouji.


> -----Original Message-----
> From: Corey Minyard [mailto:[email protected]] On Behalf Of Corey Minyard
> Sent: Sunday, September 06, 2015 8:03 AM
> To: Gouji, Masayuki/郷治 将之; [email protected]
> Subject: Re: [Openipmi-developer] [PATCH] ipmi:driver not to stop when the
> hardware not respond in irq/revised
> 
> On 09/05/2015 03:27 AM, Gouji, Masayuki wrote:
> > Hi, Corey
> >
> > I think the reason why timer stops is that timer system in IPMI is based
> on only the request from outside.
> >
> > This is typical case
> >
> > 1) IPMI driver is in idle state. No pending messages , SI_timer is already
> stopped.
> > 2) Some internal events happens and BMC raised interrupt.
> > 3) Driver send IPMI_GET_MSG_FLAGS_CMD from smi_event_handler  via
> smi_info->handlers->start_transaction
> >     start_kcs_transaction(=handlers->start_transaction) never set
> > timer
> 
> Ah then that's the bug.  The attached patch should start the timer for
> internal messages as well as external messages.  I can't believe I missed
> that.
> 
> If you could test this, it would be great.
> 
> Thanks,
> 
> -corey
> 
> >
> >  From kcs , it doesn't matter who sent the message .
> >  For working around the hang up , the timer is needed for those situasion
> either.
> >  Ipmi_timeout may be invoked in many cases.But it won't start si_timeout
> because of the reason described below.
> >
> >
> >  Some simular messages invoked internally , like
> IPMI_READ_EVENT_MSG_BUFFER_CMD/ IPMI_GET_MSG_CMD and so on , may fall in
> the same situation.
> >  All are sent directly , using smi_info->handlers->start_transaction
> >
> > Watching where check_start_timer_thread is called from ,these are
> sender() and set_need_watch().
> >
> >
> > First, in check_start_timer_thread() , It has
> >
> >  if (smi_info->si_state == SI_NORMAL && smi_info->curr_msg == NULL)
> >
> >   We have situation where both conditions are not met .
> >   When sending ie. IPMI_GET_MSG_FLAGS_CMD ,  smi_info->si_state
> ==SI_GETTING_FLAGS and  smi_info->curr_msg  is occupied by
> IPMI_GET_MSG_FLAGS_CMD.
> >
> >  Simply , the condition for timer restart is complies for the messages
> sent by APPL or other handlers.
> >
> > So, I think this code should be like this.
> >
> >
> > if ((smi_info->si_state == SI_NORMAL  && smi_info->curr_msg == NULL )
> >     || smi_info->si_state == SI_GETTING_FLAGS
> >     || smi_info->si_state == SI_GETTING_MESSAGES
> >    .....                                               )
> >
> > This is just showing the rough sketch for the fix.
> >
> > Second ,
> >
> >  Ipmi_timeout stops under some condition.
> >
> >   - there is no event_waiter , which may happen eveb if rare.
> >   - there is no pending messages after sending
> >   -
> >  In this condition, ipmi_timeout may be restarted by need_waiter().
> >
> >  This means ipmi_timeout can remain stopping while the internall messages
> are being sent.
> >
> >  This may require some fix too.
> >
> >
> >
> > If you think the way to let check_start_timer_thread() restart timer
> > correctly is better, I'd think  again according to this idea.
> >
> > By the way , my new patch still includes a BUG.
> >
> > I wanted to detect the firast attempt just after status switched to
> > KCS_WAIT_WRITE_START and used kcs->write_count == 1 But this for the
> > last attempt. I have to find another way if I adhere to it.
> > Anyway this is wrong yet.
> >
> >
> >
> > Regards,
> > M.Gouji
> >
> >
> >> -----Original Message-----
> >> From: Corey Minyard [mailto:[email protected]] On Behalf Of Corey
> >> Minyard
> >> Sent: Saturday, September 05, 2015 4:04 AM
> >> To: Gouji, Masayuki/郷治 将之;
> [email protected]
> >> Subject: Re: [Openipmi-developer] [PATCH] ipmi:driver not to stop
> >> when the hardware not respond in irq/revised
> >>
> >> On 09/04/2015 04:56 AM, Gouji, Masayuki wrote:
> >>> Hi, Corey
> >>>
> >>> This is new one which includes the missed braces and function for
> >>> modifying
> >> waiting period (default 0).
> >>> I think it complies to your comment.
> >> Well, not exactly.  Thinking through this, if you are going to delay
> >> and spin, you might as well spin completely in the KCS code, spinning
> >> on the IBF value.  That would be a lot simpler and it would return as
> >> soon as IBF was set.  However, spinning is just not a good idea,
> >> especially in hardirq or softirq mode (where some of this would be
> happening).
> >>
> >> I've spent some time thinking about this and looking at the code, and
> >> I realized that the timer should be running whenever a message is being
> sent.
> >> Which is why no one has seen this before, and it means we are dealing
> >> with a different issue.  If you look at the sender() function in
> >> ipmi_si_intf.c, it calls check_start_timer_thread(), which will start
> >> the timer and start the message send.  The timer code checks at the
> >> end if it is idle, and if it is idle it doesn't restart the timer.
> >>
> >> So somehow the timer is not getting started or is getting canceled.
> >> That's the problem we need to fix.
> >>
> >> Looking at the code:
> >>
> >> static void check_start_timer_thread(struct smi_info *smi_info) {
> >>     if (smi_info->si_state == SI_NORMAL && smi_info->curr_msg == NULL)
> {
> >>         smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_JIFFIES);
> >>
> >>         if (smi_info->thread)
> >>             wake_up_process(smi_info->thread);
> >>
> >>         start_next_msg(smi_info);
> >>         smi_event_handler(smi_info, 0);
> >>     }
> >> }
> >>
> >> perhaps start_next_msg() should come before smi_mod_timer().  Maybe
> >> there's a race where the timer goes off before start_next_msg()?
> >> Well, that shouldn't be possible because this code is called under
> >> the lock and the first the time timeout does is claim the lock.  But
> >> something is going on here.
> >>
> >> -corey
> >>

------------------------------------------------------------------------------
_______________________________________________
Openipmi-developer mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to