On 03/26/2014 11:51 AM, Strösser, Bodo wrote:
> This time with the CC to the list. Sorry for the noise.
>
>
>> -----Original Message-----
>> From: Corey Minyard [mailto:[email protected]] On Behalf Of Corey Minyard
>> Sent: Wednesday, March 26, 2014 12:41 AM
>> To: Strösser, Bodo
>> Cc: [email protected]
>> Subject: Re: ipmi_si: KCS: timeout waiting for IBF or OBF does not always 
>> work
>>
>> On 03/25/2014 05:32 PM, Strösser, Bodo wrote:
>>> Attached you'll find my modified patch to explain my hypothesis better.
>>> It seems to work fine. Thus, my hypothesis seems to be right.
>> timer_running needs to be a bool,
> Yes, of course. int is what we had to use in the good old times  ;-)

Indeed, old habits die hard :).

>
>
>> but beyond that, this is good.  I did
>> think about this, but I thought timer_running() would be enough.  But
>> thinking about it some more, I think you are right.  It's not quite good
>> enough.
>
> Have attached a new version of the patch. This time using bool and with your
> comments, that I've slightly extended. Have added my From and Signed-off-by
> hoping that this is ok for you.

Perfect, I have it queued for the next merge window.

Thanks again,

-corey

>
> The patch is based on SuSE-SLES11-SP2 (3.0.101-xxx), but it applies to
> mainline also, printing some offsets.
>
> Bodo
>
>
>
>> -corey
>>
>>> Bodo
>>>
>>>> -----Original Message-----
>>>> From: Strösser, Bodo
>>>> Sent: Tuesday, March 25, 2014 9:41 PM
>>>> To: '[email protected]'
>>>> Cc: [email protected]
>>>> Subject: RE: ipmi_si: KCS: timeout waiting for IBF or OBF does not always 
>>>> work
>>>>
>>>> Hi Corey,
>>>>
>>>> Sorry, the patch isn't good. The problem is, that the timeout time now is
>>>> a multiple of what it should be (5 seconds).
>>>>
>>>> I guess, I know the reason for this:
>>>> I think, the timer is reset quite often while the driver loops waiting for 
>>>> OBF
>>>> in KCS status 40. The reset can happen, as timer_pending() is not the 
>>>> right test.
>>>> For example the timer isn't pending if smi_timeout already is started, but 
>>>> has to
>>>> wait for the lock the is held by ipmi_thread().
>>>>
>>>> Maybe we could add a flag to smi_info, that is set in smi_mod_timer() and 
>>>> is reset
>>>> in smi_timeout() id smi_mod_timer() is not called. Then ipmi_thread() 
>>>> could test
>>>> the flag instead of calling timer_pending(). But maybe you have a better 
>>>> idea?
>>>>
>>>> Bodo
>>>>
>>>>> -----Original Message-----
>>>>> From: Corey Minyard [mailto:[email protected]] On Behalf Of Corey 
>>>>> Minyard
>>>>> Sent: Monday, March 24, 2014 7:54 PM
>>>>> To: Strösser, Bodo
>>>>> Cc: [email protected]
>>>>> Subject: Re: ipmi_si: KCS: timeout waiting for IBF or OBF does not always 
>>>>> work
>>>>>
>>>>> On 03/24/2014 11:30 AM, Strösser, Bodo wrote:
>>>>>> Hi Corey,
>>>>>>
>>>>>> Thank you for the quick reply and the patches.
>>>>>>
>>>>>> We had tests running for the entire weekend with a driver
>>>>>> patched with my change. No more errors. So it works.
>>>>> It certainy fixes the issue you were seeing, but it would cause some
>>>>> other issues that would possibly cause some error conditions to not be
>>>>> handled correctly, and it would make the power management people very
>>>>> unhappy with me :-0.
>>>>>
>>>>> Hopefully the patch I submitted will fix the issue without those side
>>>>> effects.  But your patch pointed me in the right direction, so it wasn't
>>>>> without merit.
>>>>>
>>>>> Thanks again,
>>>>>
>>>>> -corey
>>>>>
>>>>>> But of course, your patch is a much more sane fix. I'll test
>>>>>> it ASAP and will report the result.
>>>>>>
>>>>>> Bodo
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Corey Minyard [mailto:[email protected]] On Behalf Of Corey 
>>>>>>> Minyard
>>>>>>> Sent: Monday, March 24, 2014 5:10 PM
>>>>>>> To: Strösser, Bodo
>>>>>>> Cc: [email protected]
>>>>>>> Subject: Re: ipmi_si: KCS: timeout waiting for IBF or OBF does not 
>>>>>>> always work
>>>>>>>
>>>>>>> On 03/21/2014 04:02 PM, Strösser, Bodo wrote:
>>>>>>>> Hi Corey,
>>>>>>>>
>>>>>>>> We are using servers, that have a BMC connected via KCS.
>>>>>>>> We are running SuSE SLES11-SP2, kernel 3.0.101-0.7.17, but
>>>>>>>> the problem I see seems to be relevant for Mainline also.
>>>>>>>>
>>>>>>>> For our BMC, ipmi_si.ko uses both, ipmi_thread() and
>>>>>>>> smi_timeout() in parallel to call smi_event_handler().
>>>>>>>> But only smi_timeout() passes the elapsed time for timeout
>>>>>>>> handling to smi_event_handler(), but not ipmi_thread().
>>>>>>>>
>>>>>>>> After smi_timeout detected IDLE, it does not start the
>>>>>>>> timer again. Instead, a call to sender() will restart the
>>>>>>>> timer. Unfortunately, if a command to the BMC is not
>>>>>>>> started via sender(), but generated internally (eg. from
>>>>>>>> smi_event_handler() because of smi_info->req_events) and
>>>>>>>> started via smi_info->handlers->start_transaction(),
>>>>>>>> AFAICS the timer is not started. So, the timeouts in the
>>>>>>>> KCS driver while waiting for IBF or OBF won't work in this
>>>>>>>> situation.
>>>>>>>>
>>>>>>>> The BMCs in our servers seem to have a sporadic problem
>>>>>>>> that leads to the error recovery being started because of
>>>>>>>> an timeout while waiting for OBF. But in case the timer is
>>>>>>>> stopped while the problem occurs, the driver loop waiting
>>>>>>>> for OBF forever.
>>>>>>> I think I can see what is happening here, but I don't think your fix is
>>>>>>> correct.  With your fix the driver would never time out if the interface
>>>>>>> wasn't working, because it's starting the timer in the thread, the timer
>>>>>>> is what is used to time out operations, and the thread will run
>>>>>>> constantly while the driver is operating.
>>>>>>>
>>>>>>> Try the attached patches instead, please.  I've done some minimal 
>>>>>>> testing.
>>>>>>>
>>>>>>>> P.S.: There might be another minor issue in ipmi_kcs_sm.c
>>>>>>>> If check_obf() times out, it starts the error handling but does
>>>>>>>> not reset kcs->obf_timeout. Thus, in state KCS_ERROR2 check_obf()
>>>>>>>> is called with the timer already expired. So I'd like to
>>>>>>>> suggest this small change:
>>>>>>>>
>>>>>>>> --- a/drivers/char/ipmi/ipmi_kcs_sm.c   2014-03-19 10:42:51.000000000 
>>>>>>>> +0100
>>>>>>>> +++ b/drivers/char/ipmi/ipmi_kcs_sm.c   2014-03-19 10:48:33.000000000 
>>>>>>>> +0100
>>>>>>>> @@ -251,8 +251,9 @@ static inline int check_obf(struct si_sm
>>>>>>>>        if (!GET_STATUS_OBF(status)) {
>>>>>>>>                kcs->obf_timeout -= time;
>>>>>>>>                if (kcs->obf_timeout < 0) {
>>>>>>>> -                  start_error_recovery(kcs, "OBF not ready in time");
>>>>>>>> -                  return 1;
>>>>>>>> +                      start_error_recovery(kcs, "OBF not ready in 
>>>>>>>> time");
>>>>>>>> +                      kcs->obf_timeout = OBF_RETRY_TIMEOUT;
>>>>>>>> +                      return 1;
>>>>>>>>                }
>>>>>>>>                return 0;
>>>>>>>>        }
>>>>>>>>
>>>>>>> Indeed, you are right.  I've attached another patch for that.
>>>>>>>
>>>>>>> -corey


------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Openipmi-developer mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to