Hi, We are attempting to share memory between a PCI IO device and MPI running over IB. We want the IO device to write to a memory buffer, and then have MPI transfer the data from the buffer out over IB (with no copies), but we are having data corruption problems with what is received on the other end of the IB link.
The IO device requires hundreds of MBs of physically contiguous memory, so we applied the bigphysarea patch to 2.6. The bigphysarea patch allocates a large block of memory at boot time and then has permanent possession of it (so no worries about it being swapped out). We wrote a small kernel module to map the bigphysarea physical memory into user space via mmap. The user space process can work with the buffer just fine. But when we pass this buffer to MPI_Send(), data is not reliably received at the destination. In some cases data arrives correctly the first N times, but then incorrect data is received. On other cases (differing in buffer size, system configuration, etc.), the correct data is never received. If we used malloced or shared memory, the MPI data transmission works fine. We've been looking into how openib does the mapping, pinning, and freeing of the user space buffer, but we're getting a little lost between get_user_pages(), put_page(), and how it all relates to the page cache. Can someone please comment on whether there's someway for openib to accept this unusual memory? Thanks, Matt _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
