On Fri, 2005-11-18 at 23:12, Thomas Moschny wrote: > Thomas Moschny wrote: > > Exiting SM > > > > *** glibc detected *** double free or corruption (!prev): > > 0x6000000000067970 *** > > Aborted
On what processor architecture is opensm running ? Note that some better handling of opensm exiting went in at r3977 which is slightly past this (r3965). > > Subsequent runs of opensm hang in flush_cpu_workqueue or > > rwsem_down_failed_common. Sounds like something isn't cleanup up properly when the previous instance exits. After the error, is there an opensm instance still around ? If so, it wouldn't clean up some MAD registrations. > Doug Ledford wrote: > > BTW, can you try forcing opensm to run single threaded on it's first > > invocation and see if that fixes this? > > Did you mean calling opensm with -d1? That would force single thread mode. You should see something like this when opensm starts up: opensm -d1 ------------------------------------------------- OpenSM Rev:openib-1.1.0 Command Line Arguments: d level = 0x1 Debug mode: Forcing Single Thread Log File: /var/log/osm.log ------------------------------------------------- OpenSM Rev:openib-1.1.0 > Well, currently I can't see any > consistent behavior, but if called with -d1 on the *second* -o run, What is the state of the subnet ? > it doesn't seem to hang (unless there are already some unkillable instances > on this machine from earlier runs). Did you check with ps for opensm instances ? -- Hal _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
