This is almost certainly a bug in the BMC.  The change in your patch 
should have no effect, this is the start of a send, and the BMC 
interface should be idle at that point, so doing an smi_timeout will 
only result in another extraneous read from the IPMI interface (and of 
course a slightly longer delay).

I would guess that adding an extra read is working around the problem.  
Before polling was reduced, it read a whole lot more from the interface 
and probably covered the BMC bug.  You can test this by replacing that 
"smi_timeout()" added in your patch with 
"smi_info->io->inputb(smi_info->io, 1)",
which will do the read from the status register.

-corey

On 01/10/2011 06:49 PM, Brian De Wolf wrote:
> Hello, in last October I upgraded to 2.6.35 on a Sun Fire X4100 and found that
> starting the watchdog no longer worked.  It produces this output when
> started:
>
> Oct 21 15:50:14 stephen watchdog[4725]: starting daemon (5.6):
> Oct 21 15:50:14 stephen watchdog[4725]: int=30s realtime=yes sync=no soft=no 
> mla=0 mem=0
> Oct 21 15:50:14 stephen watchdog[4725]: ping: no machine to check
> Oct 21 15:50:14 stephen watchdog[4725]: file: no file to check
> Oct 21 15:50:14 stephen watchdog[4725]: pidfile: no server process to check
> Oct 21 15:50:14 stephen watchdog[4725]: interface: no interface to check
> Oct 21 15:50:14 stephen watchdog[4725]: test=none(0) repair=none 
> alive=/dev/watchdog heartbeat=none temp=none to=root no_act=no
> Oct 21 15:50:14 stephen kernel: IPMI message handler: BMC returned incorrect 
> response, expected netfn 7 cmd 22, got netfn 7 cmd 24
> Oct 21 15:50:14 stephen kernel: IPMI Watchdog: response: Error ff on cmd 22
> Oct 21 15:50:14 stephen watchdog[4725]: write watchdog device gave error 22 = 
> 'Invalid argument'!
> Oct 21 15:51:15 stephen kernel: IPMI message handler: BMC returned incorrect 
> response, expected netfn 7 cmd 35, got netfn 7 cmd 22
> Oct 21 15:51:15 stephen kernel: IPMI message handler: BMC returned incorrect 
> response, expected netfn 7 cmd 22, got netfn 7 cmd 35
>
>
> After some bisecting, I found that the patch that causes this is a
> patch to reduce ipmi polling:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3326f4f2276791561af1fd5f2020be0186459813
>
> Unfortunately, the system is unstable if I reverse this patch.  It
> crashes with "kernel BUG at kernel/timer.c:851!" (I can provide this
> output as requested)
>
>
> I originally sent this directly to Matthew Garrett but he hasn't been
> responsive for the last month or two, and I would like to eventually be
> able to upgrade to a new kernel without losing functionality.  Matthew
> provided a workaround patch, but it still produced error output
> infrequently.  He said it wasn't clean enough for upstream, but
> hopefully it will give some indication to what he found the problem to
> be:
>
> diff --git a/drivers/char/ipmi/ipmi_si_intf.c 
> b/drivers/char/ipmi/ipmi_si_intf.c
> index e829053..3f1e856 100644
> --- a/drivers/char/ipmi/ipmi_si_intf.c
> +++ b/drivers/char/ipmi/ipmi_si_intf.c
> @@ -316,6 +316,7 @@ static int unload_when_empty = 1;
>   static int add_smi(struct smi_info *smi);
>   static int try_smi_init(struct smi_info *smi);
>   static void cleanup_one_si(struct smi_info *to_clean);
> +static void smi_timeout(unsigned long data);
>
>   static ATOMIC_NOTIFIER_HEAD(xaction_notifier_list);
>   static int register_xaction_notifier(struct notifier_block *nb)
> @@ -896,6 +897,7 @@ static void sender(void                *send_info,
>   #endif
>
>       mod_timer(&smi_info->si_timer, jiffies + SI_TIMEOUT_JIFFIES);
> +     smi_timeout((unsigned long)smi_info);
>
>       if (smi_info->thread)
>               wake_up_process(smi_info->thread);
>
> ------------------------------------------------------------------------------
> Gaining the trust of online customers is vital for the success of any company
> that requires sensitive data to be transmitted over the Web.   Learn how to
> best implement a security strategy that keeps consumers' information secure
> and instills the confidence they need to proceed with transactions.
> http://p.sf.net/sfu/oracle-sfdevnl
> _______________________________________________
> Openipmi-developer mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/openipmi-developer


------------------------------------------------------------------------------
Gaining the trust of online customers is vital for the success of any company
that requires sensitive data to be transmitted over the Web.   Learn how to 
best implement a security strategy that keeps consumers' information secure 
and instills the confidence they need to proceed with transactions.
http://p.sf.net/sfu/oracle-sfdevnl 
_______________________________________________
Openipmi-developer mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to