Is this running over IBAL or over WinVerbs? Today, IBAL is responsible for tracking all memory registrations, and freeing them when the process exits. I assume WinVerbs does the same, though maybe not?
The place to trap the process exiting is in IRP_MJ_CLEANUP, not IRP_MJ_CLOSE. -Fab > -----Original Message----- > From: [email protected] [mailto:ofw- > [email protected]] On Behalf Of Hefty, Sean > Sent: Thursday, August 20, 2009 10:59 AM > To: [email protected] > Subject: [ofw] bugcheck in mlx4_bus > > I hit a bugcheck yesterday while running Intel MPI PingPong tests on a > single node, scaling up the number of ranks from 2 to 64. The system > is running Server 2003. A bugcheck analysis suggested adding the > following registry value: > > HKLM\System\CurrentControlSet\Control\Session Mgr\Memory > Mgmt\TrackLockedPages > > DWORD with a value of 1 > > This produced the bugcheck below while re-running the MPI PingPong > tests. I'm running checked drivers with free versions of the > libraries. It's possible this is pointing to a cleanup issue higher in > the stack. I'm trying to find more details. > > *********************************************************************** > ******** * * * Bugcheck Analysis * * * > *********************************************************************** > ******** > > DRIVER_LEFT_LOCKED_PAGES_IN_PROCESS (cb) > Caused by a driver not cleaning up completely after an I/O. > When possible, the guilty driver's name (Unicode string) is printed on > the bugcheck screen and saved in KiBugCheckDriver. > Arguments: > Arg1: fffffadf8e0ae4f0, The calling address in the driver that locked > the pages or if the > IO manager locked the pages this points to the dispatch routine > of the top driver on the stack to which the IRP was sent. > Arg2: 0000000000000000, The caller of the calling address in the driver > that locked the > pages. If the IO manager locked the pages this points to the > device object of the top driver on the stack to which the IRP was > sent. Arg3: fffffadf980c6580, A pointer to the MDL containing the locked > pages. Arg4: 0000000000000021, The number of locked pages. > > Debugging Details: > ------------------ > > PEB is paged out (Peb.Ldr = 000007ff`fffda018). Type ".hh dbgerr001" > for details > PEB is paged out (Peb.Ldr = 000007ff`fffda018). Type ".hh dbgerr001" > for details > > FAULTING_IP: mlx4_bus!register_segment+100 > [c:\mshefty\scm\winof\branches\winverbs\hw\mlx4\kernel\bus\core\iobuf.c > @ 197] fffffadf`8e0ae4f0 eb7d jmp > mlx4_bus!register_segment+0x17f (fffffadf`8e0ae56f) > > DEFAULT_BUCKET_ID: DRIVER_FAULT > > BUGCHECK_STR: 0xCB > > PROCESS_NAME: IMB-MPI1.exe > > CURRENT_IRQL: f > > LAST_CONTROL_TRANSFER: from fffff8000107984c to fffff80001026cf0 > > STACK_TEXT: fffffadf`8e16ee28 fffff800`0107984c : 0000fadf`8ee3aa62 > 00000000`00004cb6 00000000`00000000 00000000`00000000 : > nt!RtlpBreakWithStatusInstruction fffffadf`8e16ee30 fffff800`010c514e : > 00000000`04d18000 00000000`dffe0000 00000000`04d18000 fffffadf`9aad51b0 > : nt!KdCheckForDebugBreak+0xb5 fffffadf`8e16ee70 fffff800`010d89bb : > fffffadf`8e0ae400 00000000`00000000 00000000`00000000 00000000`000000cb > : nt!IoWriteCrashDump+0x851 fffffadf`8e16f030 fffff800`0102e994 : > fffff6fb`c0000000 fffff6fb`c0000000 fffffadf`988ba440 fffffadf`9b6b9340 > : nt!KeBugCheck2+0xb83 fffffadf`8e16f670 fffff800`01096f23 : > 00000000`000000cb fffffadf`8e0ae4f0 00000000`00000000 fffffadf`980c6580 > : nt!KeBugCheckEx+0x104 fffffadf`8e16f6b0 fffff800`0127381a : > fffffa80`01e7b960 fffffadf`8e16fc70 00000000`00000000 fffffadf`988ba440 > : nt!MmCleanProcessAddressSpace+0x904 fffffadf`8e16f720 > fffff800`0127bb72 : fffffadf`0000007b 00000000`0000007b > fffffadf`988ba488 00000000`00000000 : nt!PspExitThread+0xb4d > fffffadf`8e16f9b0 fffff800`01038c30 : 00000000`00000000 > fffffadf`8e16fcf0 00000520`657cb7f8 00000000`00000002 : > nt!PsExitSpecialApc+0x1d fffffadf`8e16f9e0 fffff800`01027c3b : > 00000000`00000000 fffffadf`8e16fa80 fffff800`0127bdc0 00000000`00000000 > : nt!KiDeliverApc+0x504 fffffadf`8e16fa80 fffff800`0102e3f2 : > fffffadf`8e16fc18 00000000`00000000 00000000`00000001 fffffadf`9b8c6540 > : nt!KiInitiateUserApc+0x7b fffffadf`8e16fc00 00000000`77ef0a6a : > 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 > : nt!KiSystemServiceExit+0xad 00000000`0012f3a8 00000000`00000000 : > 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 > : 0x77ef0a6a > > > STACK_COMMAND: .bugcheck ; kb > > FOLLOWUP_IP: mlx4_bus!register_segment+100 > [c:\mshefty\scm\winof\branches\winverbs\hw\mlx4\kernel\bus\core\iobuf.c > @ 197] fffffadf`8e0ae4f0 eb7d jmp > mlx4_bus!register_segment+0x17f (fffffadf`8e0ae56f) > > FAULTING_SOURCE_CODE: > 193: } > 194: > 195: __try { /* try */ > 196: MmProbeAndLockPages( mdl_p, mode, Operation ); > /* lock memory */ >> 197: } /* try */ > 198: 199: __except (EXCEPTION_EXECUTE_HANDLER) { 200: > MLX4_PRINT(TRACE_LEVEL_ERROR, MLX4_DBG_MEMORY, 201: > ("MOSAL_iobuf_register: Exception 0x%x on > MmProbeAndLockPages(), va %I64d, sz %I64d\n", 202: > GetExceptionCode(), va, size)); > > > SYMBOL_NAME: mlx4_bus!register_segment+100 > > FOLLOWUP_NAME: MachineOwner > > MODULE_NAME: mlx4_bus > > IMAGE_NAME: mlx4_bus.sys > > DEBUG_FLR_IMAGE_TIMESTAMP: 4a8d77d7 > > FAILURE_BUCKET_ID: X64_0xCB_mlx4_bus!register_segment+100 > > BUCKET_ID: X64_0xCB_mlx4_bus!register_segment+100 > > Followup: MachineOwner > --------- > > _______________________________________________ > ofw mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw _______________________________________________ ofw mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
