I reported: >> I'm puzzling at an odd performance behavior that I see with various versions >> of the OpenIPMI drivers. >> >> This is on a Dell PowerEdge 1800. The OS is VMware ESX 3.0.1, specifically >> its Console OS portion which is a modified Linux 2.4.21 (based on RHEL3). >> The IPMI drivers are various tweaked versions based on Corey's v35, v37 and >> v39 releases. >> >> I don't believe the driver is using IPMI interrupts in any incarnation. Some >> of the drivers I'm poking at have the "kipmi0" kernel thread, some do not. >> Performance differs greatly between those versions, but in all cases there is >> an anomaly. >> >> This anomaly is visible when watching the output of `ipmitool sdr` or >> especially `ipmitool sensor`. For "sdr", what I see is that it takes an >> exceptionally long time to read one particular sensor (ECC Corr Err). For >> "sensor", that sensor _and_ all subsequent ones are slow.
Matt Domsch wrote: > The hardware doesn't have an interrupt line, so no, it's not using > interrupts. :-) > > The kernel thread is there exactly to trade off spare CPU cycles for > faster response time from the BMC when interrupts aren't present. > Ugly, but functional. Right, I understand what it's for, just trying to puzzle out its performance characteristics on this hardware... > It's not that unusual for some devices to take a long time to > respond. That's purely a function of the BMC routine responsible for > reporting that data. If it needs to walk the SEL counting entries, > that could take a while. :-) If you want, I can try to get a > definitive answer from the BMC firmware team. Well, I suspect you may be talking to them after my further results; see below. Corey Minyard wrote: > I really doubt the driver is the problem here, at least directly. There's some driver complicity, as I will describe... > My guess is that reading an ECC sensor requires sending a machine check > to the main processor, then the main processor reads the value and > returns it. Unless the BMC has some way to directly read the registers > in the northbridge (JTAG maybe?). But either way, it will be a slower > process than most other sensor reading, I would guess. You're probably right about this. There's a qualitative difference between the sensors that are fast (fans & temperatures -- sensors that the BMC should have fairly direct access to) and slow (various CPU and/or chipset counters like ECC error counts, parity error counts etc.) So that may explain the _fact_ of the performance difference, if not the _magnitude_. > The driver is just a conduit. The messages are all the same size, so it > seems unlikely that it is the driver. > > If you can do a LAN connection to the box, then you can bypass the > driver and test it that way. Otherwise, you will need to instrument the > driver to know what is going on. I don't know how to operate IPMI via LAN, I'm sure it's possible in my setup but I haven't made any attempt in that direction. ... So. One reason I was pursuing this anomaly was that, as I said, it got _much_ worse with some driver changes. I have now gone back and serially layered on all the patches I'm trying to integrate. The cause of the extra slowdowns is the "Retryable return codes" patch, http://www.mail-archive.com/[email protected]/msg00451 .html i.e.: ipmi_msghandler.c:ipmi_smi_msg_received(): if ((msg->rsp_size >= 3) && (msg->rsp[2] != 0) && (msg->rsp[2] != IPMI_NODE_BUSY_ERR) - && (msg->rsp[2] != IPMI_LOST_ARBITRATION_ERR)) + && (msg->rsp[2] != IPMI_LOST_ARBITRATION_ERR) + && (msg->rsp[2] != IPMI_BUS_ERR) + && (msg->rsp[2] != IPMI_NAK_ON_WRITE_ERR)) I instrumented this and found that the driver is getting lots of IPMI_NAK_ON_WRITE_ERRs. No IPMI_BUS_ERRs. Each of the slow sensors hits 5 (exactly 5) IPMI_NAK_ON_WRITE_ERRs before completing. This number 5 corresponds to ipmi_msghandler.c:i_ipmi_request(): if (addr->addr_type == IPMI_IPMB_BROADCAST_ADDR_TYPE) retries = 0; /* Don't retry broadcasts. */ else --> retries = 4; It's retrying the command 4 times (== 5 total) before failing. Even if I set retries = 0 here, it's still much slower than without checking for IPMI_NAK_ON_WRITE_ERR. Total runtime for `ipmitool sensor` goes from 5s (no IPMI_NAK_ON_WRITE_ERR checking) to 24s (checking + 0 retries) to 112s (checking + 4 retries). Is it right that there's a 1s delay on the failure path, even when it's on its last (re-)try? Anyway, for my setup, that patch is very harmful to performance. Meanwhile, on the trail of what's happening with the hardware, I return to a slice of my original output: ##PS Redundancy |0x0 |discrete|0x0080|na|na|na|na|na|na ##Drive |0x0 |discrete|0x0080|na|na|na|na|na|na ##############ECC Corr Err|na |discrete|na |na|na|na|na|na|na #####ECC Uncorr Err |na |discrete|na |na|na|na|na|na|na #####I/O Channel Chk |na |discrete|na |na|na|na|na|na|na I think those "na" outputs in column 4 mean that we're not getting any information about those sensors. I should have noticed that earlier... According to `ipmitool sdr list -v all`, all of the problematic sensors correspond to "Entity ID: 34.6 (BIOS)". None of the happy sensors are provided by the BIOS. For the moment I am omitting the "Retryable return codes" patch from my working environment; then these sensors fail immediately instead of suffering 5-each 1s timeouts. Matt, is it expected that the BMC on a PE1800 can't get any sensor readings from the BIOS? >Bela< ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Openipmi-developer mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/openipmi-developer
