>From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED] >Roy, >Can you explain, please? > >For IB the operation will be layered properly on Transport primitive. >And on Recv side it will indicate in completion event DTO >that it matches RDMA Write with Immediate and that Immediate Data >is in event. > >For iWARP I expect initially, it will be layered on RDMA Write >followed by Send. The Provider can do post more efficiently >than Consumer and guarantee atomicity. >On Recv side Consumer will get Recv DTO completion in event >and Immediate Data inline as specified by Provider Attribute. > >From the performance point of view Consumers who program to IB >only will have no performance degradation at all. But this API also >allows Consumers to write ULP to be transport independent >with minimal penalty: one binary comparison and extra 4 bytes in recv >buffer.
If the application could be written transport independently, I would have no objection at all. Instead, it must be written in a transport-adaptive way and to be able to adapt to all possible implementations, the application could not send arbitrary "immediate"-sized data as messages because there is no way to distinguish between them on the receiving side. That is HUGE! It is my experience that send/receive is generally used for small messages and to take away particular message sizes or to depend on the so the application can "adapt" to whatever the immediate size is for a particular transport, if even needed, is a very weak facility to offer. It also affects interface resource allocation. Send queue sizes will have to adapt to possibly twice there size. It just dawned on me that the immediate data must be in registered memory to be sent in a message. This means the API must be amended to pass an LMR or, even worse, the provider would have to register memory in the speed path or create and manipulate its own queue of "immediate" data buffers/LMRs. Of course, LMRs are not needed and an overhead for transports that provide true immediate data. Oh, and another thing. InfiniBand indicates the size of the RDMA write in the receive completion. That is something that will have to be addressed in a "transport independent" way or dropped as part of the service. The bottom line here is that it is NOT transport independent. Now, the atomicity argument between write and send has some credibility. If an application chooses to "adapt" to an explicit write/send semantic for write completion notification in environments that can't provide it natively, this could be addressed by a generalized combined request API that can guarantee thread-based atomicity to the send queue. This seems much more straightforward to me since, in essence, to adapt to non-native immediate data services, they would have to allocate resources and behave in virtually the same way as if they did write/send explicitly. It is obvious that the proposed service is not one of immediate data in the sense defined by InfiniBand. Since true immediate data is a transport specific speed path service, it needs to be implemented as a transport specific extension. To allow an application to initiate multiple request sequences that must be queued sequentially to explicitly create a write completion notification or any other order-based sequence, a generalized combined request API should be defined. > >Arkady Kanevsky email: [EMAIL PROTECTED] >Network Appliance Inc. phone: 781-768-5395 >1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 >Waltham, MA 02451 central phone: 781-768-5300 > > _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
