Hi all,

We're seeing an IPMI related performance problem on our production
servers, which I hope someone can help me with.  These are Dell
boxes so I've CC'd Matt in case the answer is known already (sorry
for the intrusion if not, Matt).

Background to the problem is we see system time spikes (occasional
lengthy time spent in the kernel) every few minutes, and sometimes
more prolonged than others.  Using the CPU event counters, oprofile
is attributing the time to the port_inb() routine in ipmi_si.ko.
Below is one such prolonged sample, showing 31% of the measured CPU
cycles in this routine (yikes!).

CPU: P4 / Xeon with 2 hyper-threads, speed 2992.76 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not
stopped) with a unit mask of 0x01 (mandatory) count 100000
samples  %        app name                 symbol name
6891     31.7807  ipmi_si.ko               port_inb
2096      9.6666  vmlinux                  schedule
777       3.5835  vmlinux                  mwait_idle
720       3.3206  vmlinux                  _spin_lock
469       2.1630  vmlinux                  _spin_unlock_irqrestore
443       2.0431  vmlinux                  _spin_unlock
426       1.9647  ext3.ko                  ext3_group_sparse
404       1.8632  ipmi_si.ko               kcs_event
343       1.5819  vmlinux                  _spin_lock_irqsave
340       1.5680  vmlinux                  find_next_bit
328       1.5127  vmlinux                  timer_interrupt
(... chopped remainder of opreport output for brevity ...).

$ sudo /sbin/lsmod | grep ipmi
ipmi_devintf           13385  2 
ipmi_si                37449  1 
ipmi_msghandler        32041  2 ipmi_devintf,ipmi_si
$ dmesg | grep -i ipmi
ipmi message handler version 33.13
IPMI System Interface driver version 33.13, KCS version 33.13, SMIC
version 33.13, BT version 33.13
ipmi_si: Found SMBIOS-specified state machine at I/O address 0xca8,
slave address 0x20
 IPMI kcs interface initialized
ipmi device interface version 33.13
(this is a RHEL4 kernel).

I don't have a great deal of knowledge about IPMI, unfortunately.
>From what I can intuit from reading some code (and take this with
a grain of salt given the above statement), and from observing the
/proc/ipmi/0/si_stats file during these times, we seem to be seeing
large bursts of "short_timeouts".  This and the oprofile port_inb()
pointer suggests we may be going through the SI_SM_CALL_WITH_DELAY
branch in smi_timeout() a fair bit (with a device poll inside the
timeout handling code) - does that sound feasible?

If thats the case, then the use of an IRQ would seem to be an ideal
way to address this issue (smi_timeout() comment says "Running with
interrupts, only do long timeouts.").  So, I unloaded all of the
IPMI kernel drivers, and attempted to run with irqs=X,Y,Z settings.
That didn't seem to work though - the /proc/ipmi/0/si_stats file
reported "interrupts_enabled: 0" still, and I didn't see any kernel
messages that told me the option had been accepted/rejected.

Thanks for reading this far!  I guess my questions at the moment
are: does the above reasoning make sense?  And, how do I know what
IRQ number to use (I just picked free ones from /proc/interrupts)?
And how do I know if our hardware supports IPMI in interrupt mode?

cheers.

-- 
Nathan


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Openipmi-developer mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to