Roland Dreier wrote: > > Bill> I am testing an app in development on RHEL 4 U3 using uDAPL. > Bill> The app runs OK on gen1 stacks, but cannot run on any OFED > Bill> based stack I have tried on RHEL 4 U3. The symptom is RDMAs > Bill> not getting completion. A completion notification is sent, > Bill> but mthca_poll_cq() finds no completion. I debugged the > Bill> problem to this: the memory for the completion queue is not > Bill> pinned and at some point the page struct changes *after* the > Bill> HCA has been handed the address of the completion queue, so > Bill> subsequent completions are written elsewhere in memory and > Bill> the app hangs waiting for completion. > > The memory should be pinned by the call to __mthca_reg_mr() in > mthca_create_cq(), since the kernel will do get_user_pages() on the > memory. > > By any chance, does your app do fork() or system() or something like that?
At 1st, I thought that was the case, a fork, however, I do not think get_user_pages(), and the increment of the ref count, will guarantee the page struct does not change for RHEL 4 U3, I need to verify that though. I dumped the page struct in ib_umem_get() when the completion queue memory was 1st registered. Then my DTO event thread, on a 10 second timeout, would go ahead and create another EVD (not used) so I could then dump the page struct of the 1st completion queue again in ib_umem_get(), and sure enough the page struct changed. If I wrote some code that mapped an address to the original page struct, I would probably see the completions there. -Bill _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
