Bela Lubkin wrote: > > > Corey (or anyone), do you have a tool which parses these logs > and translates the unexpected netfn messages to a human > readable representation like the "Get SDR" you show above? I > can do it manually (or write a script), but don't want to > duplicate effort if it already exists. > Well, no, not really. Those logs are not supposed to come out, so IMHO there's not much value in decoding them. They are really more a "something is going wrong, here's hopefully enough information for you to fix it" log.
> >> You should report this to Sun, though if everything else is working >> correctly and it's not spewing out these errors it shouldn't affect >> normal operations very much. >> > > We have seen cases where it spews, and others where it it's > intermittent but when it happens it does mess up some IPMI- > based monitoring. And I think some cases where it's just a > sporadic complaint with no noticable consequences. > > I think at least some cases of this are due to SMM (System > Management Mode -- special out of band CPU mode that gets > underneath the host OS). An SMI (SMM interrupt) comes in, > SMM BIOS code executes and either reads the response you were > expecting or sends a new command to which you then read the > response. If this is the case, BIOS authors have to fix it > by eliminating the IPMI access; using a different channel > or interface; or ensuring that they never issue a command > while a response is pending and always consume the response > after issuing a command (not sure if that last method is > actually viable). > That's an interesting theory, but I really can't imagine that's the problem. Surely they have a different channel for that information, otherwise it would be very hard to account for in the driver. Plus, if that was happening, it would flush out the interface, start its own message and finish it. The driver would either see the interface in a strange state or would time out, it wouldn't see a wrong message. And it wouldn't explain the off by one errors Andy is seeing. I can only think of two things that could cause this: A new message could get started while one is in progress. I can't see how that could happen though, there's a queue in ipmi_si_intf.c and start_kcs_transaction() should reject a new message if its not in idle state. If something got messed up, you should see timeouts. A message gets freed while in use, then gets reused. But that doesn't really explain the symptoms, especially the "off by one" problem Andy is seeing. So I'm kind of out of ideas. Time for bed here, and I'll spend some time thinking about it. I have the test running now on a machine trying to reproduce. To tell if it's really the hardware would require instrumenting the driver to keep a trace buffer of bytes written/received and dump the trace buffer when the error occurs. I don't think it's time for that, it kind of looks like a driver issue. -corey ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Openipmi-developer mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/openipmi-developer
