> -Also, I didn’t see any mention of memory registration attributes?  I know 
> its not
> something apps need from the library, but its something the RNIC needs from 
> the app...

This is there today, so I overlooked including it.  But this isn't really a 
feature that's being exposed, but a restriction that providers have to make 
this work well.


> There are 4 main lower-level functions that need to be mapped to:
> 
> 1. **8-byte atomic write ordered with RDMA writes** OFI defines a more 
> generic atomic
> write.  Message ordering is controlled through fi_tx_attr::msg_order flags.  
> Data
> ordering is controlled through fi_ep_attr::max_order_waw_size.  The existing 
> API should
> be sufficient.
> 
> Chet> How will the provider know which opcode to put on the wire if we use 
> the same
> API?  

For verbs, this isn't an issue because there's not an alternative write atomic 
operation.

For providers with multiple protocols available, the full set of attributes 
used to configure the endpoint needs to guide the selection.  For example, if 
the application requires write-after-write message order, that's indicated 
through a msg_order flag.  If they need all write data placed in order, 
max_order_waw_size conveys that.

We have places in libfabric today where the protocol changes based on various 
attributes or operational flags.


> 2. **flush data for persistency**
> The low-level flush operation ensures previous RDMA and atomic write 
> operations to a
> given target region are persistent prior to completing.  The target region 
> may be
> accessible through multiple endpoints and NIC ports.  Also, low-level 
> transports
> require write after write message and data ordering, which is assumed by the 
> flush
> operation.
> OFI defines FI_COMMIT_COMPLETE for persistent completion semantics.  This 
> provides
> limited support, handling only the following mapping: RMA write followed by a 
> matching
> flush.  A more generic mechanism needs to be defined, which would allow for a 
> less
> strict completion on the RMA writes, with the persistent command following.  
> This is
> possible today through the FI_FENCE flag, but that could result in stalls in 
> the
> messaging.
> 
> Chet> Does the current implementation assume there is a single write with a 
> single
> flush that has the exact same rkey and regions?  Obviously need to assume 
> many writes
> before a flush and the flush may be for a portion of the written region.

The current implementation would only work for a single write followed by a 
single flush to the exact same region.  This is being called out to highlight 
the gap, so I wouldn't focus on it other than for that purpose.  This github 
comment wasn't trying to propose a solution.

> Chet> What about the GO/P PLT placement attributes of the flush command?  We 
> will need
> to expose those as well.

I listed flush operation for visibility purposes as a separate feature, just 
below.

> 3. **flush data for global visibility**
> This is similar to 2, with application and fabric visibility replacing 
> persistency.
> OFI defines FI_DELIVERY_COMPLETE as a visibility completion semantic.  This 
> has similar
> limits as mentioned above.
> 
> 4. **Data verify**
> There is no equivalent existing functionality, but it is aligned with 
> discussions
> around SmartNIC and FPGA support, which defines generic offload functionality.
> 
> Chet>  Sounds like a good fit

- Sean
_______________________________________________
ofiwg mailing list
[email protected]
https://lists.openfabrics.org/mailman/listinfo/ofiwg

Reply via email to