On 2011/03/02, at 12:50, Steven Dake wrote: > On 03/01/2011 05:50 PM, Zane Bitter wrote: >> Once more, to the list this time. It seems the Reply-To header is now >> missing again. >> >> On 2011/03/01, at 12:48, Steven Dake wrote: >> >>> One more note totemsrp.c also uses free on these frames (which should >>> have a corresponding free call down through the >>> totemrrp/totemnet/totemiba+totemudp+totemudpu layers. >>> >>> A bit more on this point as I was thinking about it. An IBA frame is >>> limited to 2048 bytes or 4096 bytes depending on the kernel driver. In >>> order to use a buffer to send packets, the buffer must be posted to the >>> send queue (ibv_post_send). Once a buffer has been posted, it may not >>> be posted again until it is processed by the hardware. ibverbs delivers >>> an event when a posted buffer is processed by the hardware via a >>> completion queue (see mcast_cq_send_event_fn). >> >> Interesting... the man page for ibv_post_send() says that "The buffers used >> by a WR can only be safely reused after WR the request is fully executed and >> a work completion has been retrieved from the corresponding completion queue >> (CQ)", which is open to interpretation of the word "reuse". Obviously you >> can't change the data and reuse the buffer for a different frame before the >> original one has been sent. But can you enqueue it again with the _same_ >> data? >> > > With netmalloc I hadn't thought about the rrp case. > > I believe the buffer can be posted to multiple queues. The reason it > can't be "reused" is because what the RDMA hardware is actually doing is > a remote dma operation on the hardware. If you were to queue the frame > in the hardware, then make changes before getting the transmitted event, > the hardware may end up transmitting a partially changed buffer. > > This does create special problems for the rrp case - because rrp must > allocate one set of frames in iba which act as one global pool (vs the > current model where there are two separate pools per ring). > >> The reason I ask is that the active rrp algorithm sends the token to all >> non-faulty interfaces. At the moment, the iba driver is doing a memcpy() for >> each of these; if it still requires a separate buffer for each outgoing >> frame then the best we can do is reduce the number of memcpy() calls by 1 >> (for the n=1 case that's still a 100% reduction, which is not nothing). I >> think it would also require a different interface to the totemnet_malloc() >> function. >> >>> A reference count is not needed for totemiba frames because all buffers >>> are "preallocated" (required by RDMA design) so a totemrrp_free (X) >>> operationn, which would call totemnet_free (X) which would call >>> totemiba_free (X) would be a no op. >>> >>> One area I went wrong when I wrote the iba code originally is I >>> separated the send and receive buffer data structures into two separate >>> free lists with two separate data structures. This results in needless >>> complication and will have to be merged into one "free list" from which >>> prepared buffers can be retrieved and posted and then put back to. The >>> reason is because of how the memory protection domains work (a technical >>> detail of rdma) wouuld limit the ability for the software to work >>> properly with the current setup and a netmallocing feature. But before >>> heading down this road, I'd focus instead on keeping the current >>> totemiba behavior (of the memcpy) and get the rest of the interfaces in >>> shape. >>> >>> Regards >>> -steve >> >> I'm happy to go ahead and implement the first patch, but I'm also trying to >> get my head around the iba stuff because it seems like how that works could >> potentially affect what the interface to the totemnet_malloc() function >> needs to be. >> >> cheers, >> Zane. >> _______________________________________________ >> Openais mailing list >> [email protected] >> https://lists.linux-foundation.org/mailman/listinfo/openais
OK, finally found time to test that first patch :) I think I have a fairly good idea of how to tackle the iba change. One question: does the change to a global pool (instead of separate pools for each ring) mean we need to consider thread-safety when adding/removing buffers to/from the free list, or are all rings handled by the same thread? thanks, Zane. _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
