On 11/27/2018 09:08 AM, Jörn Schumacher wrote:
On 11/26/2018 07:44 PM, Jason Gunthorpe wrote:
On Mon, Nov 26, 2018 at 04:42:49PM +0100, Jörn Schumacher wrote:
On 11/19/2018 08:42 PM, Hefty, Sean wrote:
The only alternative I can think of is to try a normal registration
call, and if that fails, try again using the physical flag.  Would
this work, or does the normal registration call succeed, but produce
an unusable MR?

This would not work because of a subtlety of the physical memory
registration. The reason is that actually NULL is passed as address in
the call. Check the github link to my patch in the other E-Mail, there
is a line that replaces the address with NULL.

If a user passes an illegal virtual address the call should fail. But
if the libfabric call falls back to the physical address registration,
this would then actually succeed as the address is replaced with NULL.

I looked back at the patches and related documentation.  IMO, the verbs physical memory registration interface is just weird.  There is no association between the actual pages and the region AFAICT.

Indeed this is a rather strange extension.

I came across a potential solution to adjust our driver to produce memory that is compatible with the RDMA stack in the kernel. Supposedly there is an alternative to remap_pfn_range. In that case we would not need the physical memory registration in libfabric anymore and the overall solution would be
cleaner (not dependent on the verbs provider)

Several other people have been interested in this, I think many would
appreciate it if you share your solution to the linux-rdma mailing
list.

Yes, I was thinking of writing this up for the list, will do that in the next couple of days after running some tests.

I posted the solution we found to the Linux-RDMA mailing list, I copy it below for reference in this list.

In the end I do not think we need support in libfabric for the physical address registration, the other solution we found seems a lot cleaner.

Thanks for the help & cheers,
Jörn

---

Eventually we found a solution that works for our use case. I would like to share it here in case somebody stumbles over this thread with a similar problem.

To summarize the problem once more: We have a driver that manages large buffers that are used by a PCIe device for DMA writes. We would like to use these buffers in RDMA calls, but the ibv_reg_mr call fails because the mmap'ed memory address is incompatible with the RDMA driver stack.

The driver mentioned above was written by Markus and is not published anywhere right now, but the code could be shared (without guarantee of support) if it is of interest to anybody.

In fact there are two approaches that work.

Approach 1:
There is a verbs extensions that allows the registration of physical addresses. This verb is not available in the mainline kernel, but for example the Mellanox OFED driver supports it. The concept is written up in [1], but in a nutshell it involves calling ibv_exp_reg_mr with the IBV_EXP_ACCESS_PHYSICAL_ADDR flag. The call is not actually associated with any memory address, but rather registers the full physical address space.

The *physical* address can then be used in verb calls. Our driver exposes the physical address of managed memory to userspace, so this approach works fine.

To get this to play together with libfabric we had to patch it slightly [2]. However, this is unlikely to land in mainline libfabric.


Approach 2:
The other idea is to mmap the device driver's memory into user space such that it is compatible with the RDMA drivers. Our original driver uses remap_pfn_range, which works fine, but the resulting memory is not compatible with the get_user_pages call that is used in the Linux RDMA drivers. An alternative to remap_pfn_range is to provide an implementation of the nopage method to the mapping VMA. This is described in detail in the book of Rubini [3].

The mmap using the "nopage"-approach produces a mapping that is compatible with get_user_pages. Hence, the virtual address of such a mapping can be directly used in any libibverbs or libfabric calls.


We opted for the 2nd approach.


Cheers,
  Markus & Jörn


[1] https://community.mellanox.com/docs/DOC-2480
[2] https://github.com/joerns/libfabric/compare/v1.6.x...joerns:phys_addr_mr
[3] https://lwn.net/Kernel/LDD3/, Chapter 15 "Memory Mapping and DMA"
_______________________________________________
ofiwg mailing list
[email protected]
https://lists.openfabrics.org/mailman/listinfo/ofiwg

Reply via email to