Add MLNX regression team. Gilad/Benyahu can you add relevant information about the test scenario. It related to internal bug 94586
From: Smith, Stan [mailto:[email protected]] Sent: Sunday, November 21, 2010 9:10 PM To: Uri Habusha; Hefty, Sean; [email protected] Subject: RE: ASSERT in cl_spinlock_acquire function in OSM ________________________________ From: Uri Habusha [mailto:[email protected]] Sent: Saturday, November 20, 2010 1:21 PM To: Hefty, Sean; [email protected]; Smith, Stan Subject: ASSERT in cl_spinlock_acquire function in OSM During our IPoIB regression we got an assert. The reason for the assert is that the spin lock wasn't initialized. I take a look on osm_log object but it's looks to me corrupted, (the log_file_name is wrong). Is it a known issue? No. Yes, the log filename is corrupted as if the invalid memory access fault handler is using the osm log file name buffer/memory for it's error log (sprintf) buffer. Any idea which module contains the offending address 0x08004633`39010038 ? The umad_port_id == -1 looks strange. How are the test systems configured w.r.t. IB fabric? Single IB switch? How many other systems attached to the switch? Is there just a single IPoIB transfer going on? Any cable pulls or system(s) shutting down? What type of IPoIB transfer was going on? How might others reproduce this situation? Uri 3: kd> kb RetAddr : Args to Child : Call Site 00000000`754954d9 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!DbgBreakPoint 00000000`fff01b72 : 00000000`007cef18 00000000`00000000 00000000`ffec5758 00000000`00222460 : complibd!cl_spinlock_acquire+0x39 [s:\builds\6861\trunk\inc\user\complib\cl_spinlock_osd.h @ 107] 00000000`fff62879 : 00000000`007cef10 00000000`00000010 00000000`ffeb652c 00000000`ffed8db8 : opensm!osm_log+0x1c2 [s:\builds\6861\trunk\ulp\opensm\user\opensm\osm_log.c @ 171] 00000000`fff0247e : 00000000`006127c0 00000000`00000100 00000000`007c6fb0 00000000`0116f9a0 : opensm!osm_vendor_get+0x49 [s:\builds\6861\trunk\ulp\opensm\user\libvendor\osm_vendor_ibumad.c @ 995] 00000000`fff608b3 : 00000000`0024f1b0 00000000`006127c0 00000000`00000100 00000000`0116f9a0 : opensm!osm_mad_pool_get+0xbe [s:\builds\6861\trunk\ulp\opensm\user\opensm\osm_mad_pool.c @ 95] 00000000`754a2d0a : 00000000`00612770 00000000`00000000 00000000`00000000 00000000`00000000 : opensm!umad_receiver+0x3b3 [s:\builds\6861\trunk\ulp\opensm\user\libvendor\osm_vendor_ibumad.c @ 314] 00000000`7712be3d : 00000000`00612770 00000000`00000000 00000000`00000000 00000000`00000000 : complibd!cl_thread_callback+0x1a [s:\builds\6861\trunk\core\complib\user\cl_thread.c @ 49] 00000000`77266a51 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d 3: kd> ??p_spinlock struct _cl_spinlock * 0x00000000`007cef18 +0x000 crit_sec : _RTL_CRITICAL_SECTION +0x028 initialized : 0 3: kd> ??p_log struct osm_log * 0x00000000`007cef10 +0x000 level : 0x30 '0' +0x008 lock : _cl_spinlock +0x038 count : 0 +0x03c max_size : 0 +0x040 flush : 0 +0x048 out_port : (null) +0x050 accum_log_file : 0 +0x054 daemon : 0 +0x058 log_file_name : 0x08004633`39010038 "--- memory read error at address 0x08004633`39010038 ---" +0x060 log_prefix : 0x00000000`00222460 "p???" 3: kd> ??p_vend struct _osm_vendor * 0x00000000`007cb340 +0x000 p_log : 0x00000000`007cef10 osm_log +0x008 ca_count : 0x7c6590 +0x010 p_ca_info : (null) +0x018 timeout : 0xbb8 +0x01c max_retries : 3 +0x020 agents : [32] 0x00000000`006127c0 +0x120 ca_names : [32] [64] "ibv_device0" +0x920 mtbl : vendor_match_tbl +0x930 umad_port : umad_port +0x9f8 cb_mutex : 0x00000000`00000060 +0xa00 match_tbl_mutex : 0x00000000`00000064 +0xa08 umad_port_id : -1 +0xa10 receiver : (null) +0xa18 issmfd : -1 +0xa1c issm_path : [256] ""
_______________________________________________ ofw mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
