Libor, all, Tomorrow, I plan to start working on sdp zcopy support for synchronous send/recv socket operations. Both kernel-level and user-level initiators should continue to be supported. I dont plan to work on sendfile support, yet.
I hope to finish the implementation and some basic testing in the coming two weeks time. The development will be done on a branch, to be opened tomorrow. My plan is to merge updates from trunk to stay in sync as much as possible. What follows is a raw design draft. Comments are welcome. MST ------------------------------------------------------------------------ What needs to be done to support zcopy for synchronous send/recv socket operations. Draft rev 2 Currently only AIO is supported for Zcopy. We reuse the ZCopy infrastructure for send/recv socket operations. Main differences between AIO and send/recv operations: - send/recv have more flags: MSG_DONTWAIT, MSG_OOB, MSG_WAITALL, MSG_PEEK. - after a send/recv call returns, its illegal for the HCA to touch the application's buffer. - send/recv must support data sizes too big to be transferred in one infiniband operation (SOCK_STREAM applications dont seem to expect to get EMSGSIZE). - send/recv have the ability to block until an operation completes. This has to be implemented by SDP. - with send/recv, operation is revoked with a signal, unlike aio which is canceled explicitly by the application. - typically, there is only one outstanding send/recv operation on a specific socket - send/recv must be supported for kernel and user-space consumers. current aio code seems to only support user-level consumers. Design draft covers: Send side, Receive side, Send/Send deadlock prevention: ----------------------- Send side: Operation: - Attempt zcopy if the message is bigger than send bcopy threshold - If the operation is too big to fit in a single FMR, split it to multiple buffers (iocb), queue them for processing. Q: limit the number of FMRs used by a single socket? Block till the last iocb completes. - If no FMRs are available Force bcopy transfer - On signal, locate and cancel all queued iocbs. This may need to block, in which case we block in uninterruptible state (with a timeout) If iocbs cant be canceled within a predefined time, treat this as a transport error, trigger an abortive close Options/Socket flags: - MSG_OOB out of band data Force bcopy transfer Q: What to do if src avail are outstanding? A: - MSG_DONTWAIT/O_NONBLOCK non-blocking operation Force bcopy transfer ----------------------- Receive side: Operation: - Attempt zcopy if the message is bigger than rcv bcopy threshold - If the operation is too big to fit in a single FMR, split it to multiple buffers (iocb), queue them for processing. Block till the last iocb completes. - With MSG_WAITALL: Dont post sink available. - Without MSG_WAITALL: Post exactly one sink available at a time. Registration can still be pipelined with RDMA. Note that recv without MSG_WAITALL may return a shorter message than what was sent. This is OK: "For stream-based sockets, such as SOCK_STREAM, message boundaries shall be ignored. In this case, data shall be returned to the user as soon as it becomes available, and no data shall be discarded." - If no FMRs are available Force bcopy transfer - On signal, locate and cancel all queued iocbs This may need to block, in which case we block in uninterruptible state (with a timeout) If iocbs cant be canceled within a predefined time, treat this as a transport error, trigger an abortive close Options/Socket flags: - MSG_OOB out of band data Handle as it arrives Q: Should we force bcopy transfer (SendSM)? A: - MSG_DONTWAIT/O_NONBLOCK non-blocking operation Force bcopy transfer - MSG_PEEK peek data in buffer Force bcopy transfer ----------------------- Send/Send deadlock prevention: quoting SDP spec: Receive side detects deadlock if: . A SrcAvail is received; and . No ULP receive buffer is posted; and . The local Data Source has a SrcAvail outstanding. There are several ways to resolve this deadlock. Resolve in the following way: . The Data Sink could send a SendSm message to force the use of the Bcopy data transfer mechanism. -- MST _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
