[adding openib-general CC] > I have a question about using GM + OpenIB at the same time, it seems > to be causing bad things to happen (process goes into state D) :-). > Here is the issue: > In Open MPI we allow striping of an MPI message across multiple > interconnects at once. In this case I am using GM and OpenIB. This is > using an RDMA pipeline protocol which attempts to overlap > registration and communication (RDMA Write). In the protocol the > target registers a chunk of the message and sends an RDMA Write > request to the origin, the origin then registers the corresponding > chunk of memory and initiates an RDMA Write. Upon completion of the > RDMA Write an RDMA FIN message is sent from the origin to the target. > The target is allowed to have 4 RDMA Write requests outstanding at > any time. > As an example, lets say that the user buffer extends from address 3 > through 12200. The target begins by registering lets say address 3 - > 8000 with OpenIB, under the covers the addresses are page aligned so > we actually register from 0 through 8191. An RDMA Write request is > sent to the origin, note that the origin will only RDMA Write into > addresses 3 - 8000. > The target then begins registering address 8001 through 12200 with > GM, again under the covers the addresses are page aligned so we > actually register from 4096 through 12287 and send an RDMA Write > request to the origin. Again note that the origin will only RDMA > Write into address 8001 through 12200. > > The problem is that when this occurs the process goes into D state > (uninterruptible sleep). After this occurs I am still able to use GM > and OpenIB individually and can even attempt to use them together > (with the result of the process again going into state D).
Finding out where the process is sleeping would probably be useful. You can do "cat /proc/<pid>/wchan" to get a little info. Even better would be to to "echo t > /proc/sysrq-trigger" and send the complete kernel log messages that that produces (and also include the PID that is stuck in uninterruptible sleep). However I think it will probably be up to myricom to debug this in the end -- my ability to figure out what's happening is very limited without the GM sources, and I'm not that interested in debugging someone else's proprietary software anyway. - R. _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
