Ok, I'm finally back from all the traveling I have been doing, and I can 
focus some effort on this.  Thanks for sending the full logs, that gave 
me enough information to go on.

I just pulled the current git and I didn't have any problem, but I don't 
have a machine with a SMIC interface.  However, I can't see how that 
would make a difference in this case.

Looking at your trace and the code, I can't see anything wrong.

The place where it says:
    start_smic_transaction - 18 08
is the last message sent to the BMC as part of startup.  That's what the 
"get_guid()" function in ipmi_msghandler.c does, and it does get a 
response.  It returns an error response, but that's ok, as many systems 
do not have a GUID and either way it should kick off the initialization 
code again (and the GUID code has been there a while).  At this point in 
time the IPMI interface is fully set up and operational, it's just doing 
some housekeeping for setting up the sysfs and proc information.  (The 
channel scan will not occur because this is an IPMI 1.0 system)

I don't think bisecting is going to help, as the code changes in 
question are not going to have anything to do with where the break 
appears to occur.  From what I can tell, one of the following things is 
happening:

    * Somehow the wakeup to get_guid() is not happening properly.
    * add_proc_entries() in ipmi_msghandler.c is hanging someplace.
    * The sysfs initialization in ipmi_bmc_register() is hanging.
    * The proc entries added in try_smi_init() in ipmi_si_intf.c are
      hanging.

So, can you check the following after attempting to load the module?

    * Can you look in /proc/ipmi and /proc/ipmi/0 see what exists?  If
      /proc/ipmi/0 exists, and the version, ipmb, and stats files exist
      in it, that means add_proc_entries() succeeded.  If type,
      si_stats, and params exist, that means initialization should be
      complete.
    * Can you look in /sys/class/ipmi?  If ipmi0 exists there, that
      means the sysfs code probably worked ok.

I'm guessing this is either some transient bug someplace else or some 
latent bug in the IPMI code that is being exercised by some other change.

If you are really adventurous, you could compile the kernel with the 
MAGIC_SYSRQ config enabled, then to a sysrq-T to get a backtrace of all 
tasks.  Then hunt down the modprobe task.  A serial console is the best 
for this, if you have it, because you can send it to a file easily.  
That would tell exactly where it is hanging.  However, we are getting 
pretty far into the kernel hacker realm here.

Thanks,

-corey

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
_______________________________________________
Openipmi-developer mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to