On 03/01/2011 05:50 PM, Zane Bitter wrote: > Once more, to the list this time. It seems the Reply-To header is now missing > again. > > On 2011/03/01, at 12:48, Steven Dake wrote: > >> One more note totemsrp.c also uses free on these frames (which should >> have a corresponding free call down through the >> totemrrp/totemnet/totemiba+totemudp+totemudpu layers. >> >> A bit more on this point as I was thinking about it. An IBA frame is >> limited to 2048 bytes or 4096 bytes depending on the kernel driver. In >> order to use a buffer to send packets, the buffer must be posted to the >> send queue (ibv_post_send). Once a buffer has been posted, it may not >> be posted again until it is processed by the hardware. ibverbs delivers >> an event when a posted buffer is processed by the hardware via a >> completion queue (see mcast_cq_send_event_fn). > > Interesting... the man page for ibv_post_send() says that "The buffers used > by a WR can only be safely reused after WR the request is fully executed and > a work completion has been retrieved from the corresponding completion queue > (CQ)", which is open to interpretation of the word "reuse". Obviously you > can't change the data and reuse the buffer for a different frame before the > original one has been sent. But can you enqueue it again with the _same_ data? >
With netmalloc I hadn't thought about the rrp case. I believe the buffer can be posted to multiple queues. The reason it can't be "reused" is because what the RDMA hardware is actually doing is a remote dma operation on the hardware. If you were to queue the frame in the hardware, then make changes before getting the transmitted event, the hardware may end up transmitting a partially changed buffer. This does create special problems for the rrp case - because rrp must allocate one set of frames in iba which act as one global pool (vs the current model where there are two separate pools per ring). > The reason I ask is that the active rrp algorithm sends the token to all > non-faulty interfaces. At the moment, the iba driver is doing a memcpy() for > each of these; if it still requires a separate buffer for each outgoing frame > then the best we can do is reduce the number of memcpy() calls by 1 (for the > n=1 case that's still a 100% reduction, which is not nothing). I think it > would also require a different interface to the totemnet_malloc() function. > >> A reference count is not needed for totemiba frames because all buffers >> are "preallocated" (required by RDMA design) so a totemrrp_free (X) >> operationn, which would call totemnet_free (X) which would call >> totemiba_free (X) would be a no op. >> >> One area I went wrong when I wrote the iba code originally is I >> separated the send and receive buffer data structures into two separate >> free lists with two separate data structures. This results in needless >> complication and will have to be merged into one "free list" from which >> prepared buffers can be retrieved and posted and then put back to. The >> reason is because of how the memory protection domains work (a technical >> detail of rdma) wouuld limit the ability for the software to work >> properly with the current setup and a netmallocing feature. But before >> heading down this road, I'd focus instead on keeping the current >> totemiba behavior (of the memcpy) and get the rest of the interfaces in >> shape. >> >> Regards >> -steve > > I'm happy to go ahead and implement the first patch, but I'm also trying to > get my head around the iba stuff because it seems like how that works could > potentially affect what the interface to the totemnet_malloc() function needs > to be. > > cheers, > Zane. > _______________________________________________ > Openais mailing list > [email protected] > https://lists.linux-foundation.org/mailman/listinfo/openais _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
