I am testing an app in development on RHEL 4 U3 using uDAPL. The app runs OK on gen1 stacks, but cannot run on any OFED based stack I have tried on RHEL 4 U3. The symptom is RDMAs not getting completion. A completion notification is sent, but mthca_poll_cq() finds no completion. I debugged the problem to this: the memory for the completion queue is not pinned and at some point the page struct changes *after* the HCA has been handed the address of the completion queue, so subsequent completions are written elsewhere in memory and the app hangs waiting for completion.
I hacked in the following to get the app running, I replaced the allocation of the completion buffer in libmthca, ret = posix_memalign(memptr, alignment, size); with, size = (size + (4096-1)) & ~(4096-1); *memptr = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS | MAP_LOCKED,0,0); Is there a restriction on using completion queues on a RHEL 4 Update 3 kernel ? Am I missing a patch ? Details in http://openib.org/bugzilla/show_bug.cgi?id=147 -Bill _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
