At 05:09 PM 3/20/2006, Dror Goldenberg wrote: >It's not exactly the same. The important difference is about >scatter/gather. >If you use dma_mr, then you have to send a chunk list from the client to >the server. Then, for each one of the chunks, the server has to post an >RDMA read or write WQE. Also, the typical message size on the wire >will be a page (I am assuming large IOs for the purpose of this >discussion).
Yes, of course that is a consideration. The RPC/RDMA protocol carries many more "chunks" for NFS_READ and NFS_WRITE RPCs in this mode. But, the performance is still excellent, because the server can stream RDMA Writes and/or RDMA Reads to and from the chunklists in response. Since NFS clients typically use 32KB or 64KB sizes, such chunklists are typically 8 or 16 elements, for which the client offers large numbers of rdma read responder resources. Along with large numbers of RPC/RDMA operation credits. In a typical read or write burst, I have seen the Linux client have 10 or 20 RPC operations outstanding, each with 8 or 16 RDMA operations and two sends for the request/response. In full transactional workloads, I have seen over a hundred RPCs. It's pretty impressive on an analyzer. >Alternatively, if you use FMR, you can take the list of pages, the IO is >comprised of, collapse them into a virtually contiguous memory region, >and use just one chunk for the IO. >This: >- Reduces the amount of WQEs that need to be posted per IO operation > * lower CPU utilization >- Reduces the amount of messages on the wire and increases their sizes > * better HCA performance It's all relative! And most definitely not a zero-sum game. Another way of looking at it: If the only way to get fewer messages is to incur more client overhead, it's (probably) a bad trade. Besides, we're nowhere near the op rate of your HCA with most storage workloads. So it's an even better strategy to just put the work on the wire asap. Then, the throughput simply scales (rises) with demand. This, by the way, is why the fencing behavior of memory windows is so painful. I would much rather take an interrupt on bind completion than fence the entire send queue. But there isn't a standard way to do that, even in iWARP. Sigh. Tom. _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
