On 03/01/2011 05:50 PM, Zane Bitter wrote:
> Once more, to the list this time. It seems the Reply-To header is now missing 
> again.
> 
> On 2011/03/01, at 12:48, Steven Dake wrote:
> 
>> One more note totemsrp.c also uses free on these frames (which should
>> have a corresponding free call down through the
>> totemrrp/totemnet/totemiba+totemudp+totemudpu layers.
>>
>> A bit more on this point as I was thinking about it.  An IBA frame is
>> limited to 2048 bytes or 4096 bytes depending on the kernel driver.  In
>> order to use a buffer to send packets, the buffer must be posted to the
>> send queue (ibv_post_send).  Once a buffer has been posted, it may not
>> be posted again until it is processed by the hardware.  ibverbs delivers
>> an event when a posted buffer is processed by the hardware via a
>> completion queue (see mcast_cq_send_event_fn).
> 
> Interesting... the man page for ibv_post_send() says that "The buffers used 
> by a WR can only be safely reused after WR the request is fully executed and 
> a work completion has been retrieved from the corresponding completion queue 
> (CQ)", which is open to interpretation of the word "reuse". Obviously you 
> can't change the data and reuse the buffer for a different frame before the 
> original one has been sent. But can you enqueue it again with the _same_ data?
> 

With netmalloc I hadn't thought about the rrp case.

I believe the buffer can be posted to multiple queues.  The reason it
can't be "reused" is because what the RDMA hardware is actually doing is
a remote dma operation on the hardware.  If you were to queue the frame
in the hardware, then make changes before getting the transmitted event,
the hardware may end up transmitting a partially changed buffer.

This does create special problems for the rrp case - because rrp must
allocate one set of frames in iba which act as one global pool (vs the
current model where there are two separate pools per ring).

> The reason I ask is that the active rrp algorithm sends the token to all 
> non-faulty interfaces. At the moment, the iba driver is doing a memcpy() for 
> each of these; if it still requires a separate buffer for each outgoing frame 
> then the best we can do is reduce the number of memcpy() calls by 1 (for the 
> n=1 case that's still a 100% reduction, which is not nothing). I think it 
> would also require a different interface to the totemnet_malloc() function.
> 
>> A reference count is not needed for totemiba frames because all buffers
>> are "preallocated" (required by RDMA design) so a totemrrp_free (X)
>> operationn, which would call totemnet_free (X) which would call
>> totemiba_free (X) would be a no op.
>>
>> One area I went wrong when I wrote the iba code originally is I
>> separated the send and receive buffer data structures into two separate
>> free lists with two separate data structures.  This results in needless
>> complication and will have to be merged into one "free list" from which
>> prepared buffers can be retrieved and posted and then put back to.  The
>> reason is because of how the memory protection domains work (a technical
>> detail of rdma) wouuld limit the ability for the software to work
>> properly with the current setup and a netmallocing feature.  But before
>> heading down this road, I'd focus instead on keeping the current
>> totemiba behavior (of the memcpy) and get the rest of the interfaces in
>> shape.
>>
>> Regards
>> -steve
> 
> I'm happy to go ahead and implement the first patch, but I'm also trying to 
> get my head around the iba stuff because it seems like how that works could 
> potentially affect what the interface to the totemnet_malloc() function needs 
> to be.
> 
> cheers,
> Zane.
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to