On Wed, 2006-09-13 at 12:00 +0300, Or Gerlitz wrote: > Ralph Campbell wrote: > > Problem: > > > > The IB kernel to IB device driver interface uses dma_map_single() > > and dma_map_sg() to allocate device bus addresses for HW DMA. > > These bus addresses are passed to the IB device driver via ib_post_send() > > and ib_post_recv(). > > > > The ib_ipath driver needs kernel virtual addresses in order to be able > > to copy data to/from the posted work requests since it does not > > use HW DMA. It currently relies on the mapping being one-to-one > > and cannot reasonably reverse the mapping when an IOMMU is present. > > Oops, please note that one can get through the DMA api a DMA address for > a page which is currently **not** mapped into the kernel virtual address > space (that is page_address(p) is NULL), so you must add kmap and kunmap > into your fast RX/TX code path.
Yes, these are called "high pages". > Examples for scenarios when this happen i can think of are Direct I/O > and some sort of pre-fetching done by File-System. Some pages present in > a kernel SG which needs to be sent/received/RDMA-ed over IB need not be > mapped into the kernel virtual address space. Well, the other parts of the kernel might not need a kernel virtual address but the ib_ipath driver still does. > As for RDMA, please note that the problem has two faces, the remote > device which does the RDMA or the local device does RDMA from/to and > second, the local device. > > Since you need to be able interop between devices that support DMA > mappings to ones which do not, how do you suggest to manage the > addresses for the following schemes (1 stands for device supporting DMA > addresses and 0 for device which does not) > > <1,1> > <1,0> > <0,1> > <0,0> > > Please assume for the purpose of discussion that each side knows the > polarity of the remote side? > > After writing the section on RDMA i think i might went to the wrong > direction since ipath emulates RDMA in SW, can you shed some light on this? I don't understand what you are talking about. There is an IB wire protocol for RDMA, SEND, etc. That doesn't change depending on the HCA. The InfiniPath HCA has a ring buffer of receive buffers and all incoming IB packets are DMA'ed into one of these buffers. The ib_ipath software driver examines the packet and copies it to the appropriate address. For a packet received with a RC_RDMA_WRITE_FIRST, the RKEY and IB address are used to convert that into a kernel virtual address and the data is copied. The same happens for RC_SEND_FIRST but the KV address comes from the LKEY and address in the work request posted by ib_post_recv(). Sending data is similar, the driver constructs a packet with the appropriate opcode and writes it to the chip which puts it on the wire. > > I also tried proposing adding a flag to the ib_device structure > > and modifying the kernel IB code to check the flag and pass > > either the dma_*() mapped address or a kernel virtual address. > > This works OK for kmalloc() buffers where dma_map_single() is > > being called but doesn't work well for SRP which has lists > > of physical pages and calls dma_map_sg(). > > It also means that the kernel IB layer needs to explicitly handle > > two different kinds of addresses. > > Just a note, its not just SRP there... its any ulp which needs to move > over IB data present bunch of pages (eg packed in a kernel SG list), > namely iSER, NFSoRDMA, Lustre, IB native imp of send_page(), etc. Sure. In each such case, the code would need to be modified to use the ib_dma_*() routines instead of dma_*() for addresses used with the LKEY/RKEY returned from ibv_get_dma_mr(). _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
