[ofiwg] Proposal for enhancement to support additional Persistent Memory use cases (ofiwg/libfabric#5874)

Swaro, James E Mon, 27 Apr 2020 09:34:41 -0700

Introduction
Libfabric requires modifications to support RMA and atomic operations targeted 
at remote memory registrations backed by persistent memory devices. These 
modifications should be made with the intent to drive support for persistent 
memory usage by applications that rely on communications middleware such as 
SHMEM in a manner that is consistent with byte-based/stream-based addressable 
memory formats. Existing proposals (initial proposal) support NVMe/PMoF 
approaches, which this approach should support flat memory, non-block addressed 
memory structures and devices.
Changes may be required in as many as three areas:


  *   Memory registration calls
     *   This allows a memory region to be registered as being capable of 
persistence. This has already been introduced into the upstream libfabric 
GITHUB, but should be reviewed to ensure it matches use case requirements.
  *   Completion semantics
     *   These changes allow a completion event or notification to be deferred 
until the referenced data has reached the persistence domain at the target. 
This has already been introduced into the upstream libfabric GITHUB, but should 
be reviewed to ensure it matches use case requirements.
  *   Consumer control of persistence
     *   As presently implemented in the upstream libfabric GITHUB, persistence 
is determined on a transaction-by-transaction basis. It was acknowledged at the 
time that this is a simplistic implementation. We need to reach consensus on 
the following:
        *   Should persistence be signaled on the basis of the target memory 
region? For example, one can imagine a scheme where data targeted at a 
particular memory region is automatically pushed into the persistence domain by 
the target, obviating the need for any sort of commit operation.
        *   Is an explicit 'commit' operation of some type required, and if so, 
what is the scope of that commit operation? Is there a persistence fence 
defined such that every operation prior to the fence is made persistent by a 
commit operation?
Proposal
The experimental work in the OFIWG/libfabric branch is sufficient for the needs 
of SHMEM, with exception to the granularity of event generation. When the 
current implementation generates events, it would generate commit-level 
completion events with every operation. That type of operation would make the 
delivery of completion events take longer than necessary for most operations, 
so SHMEM would need finer control over commit flushing behavior.
To satisfy this, the following is being proposed:

  *   A new API: fi_commit (See definitions: fi_commit)
The new API would be used to generate a commit instruction to a target peer. 
The instruction would be defined by a set of memory registration keys, or 
regions by which the target could issue a commit to persistent memory.
     *   A single request to fi_commit should generate a control message to 
target hardware or software emulation environment to flush the contents of 
memory targets. Memory targets are defined by the iov structures, and key 
fields – and the number of memory targets are defined by the count field. The 
destination address is handled by the dest_addr field. The flags field is held 
reserved at this time to allow for flexibility in the API design to future 
proof against options we might not conceive of until after the prototype is 
complete, and the context available for the user and returned with the 
completion
     *   Since this API behaves like a data transfer API, it is expected that 
this API would generate a completion event to the local completion queue 
associated with the EP from which the transaction was initiated against.
     *   At the target, this should generate an event to the target's event 
queue – if and only if the provider supports software emulated events. If a 
provider is capable of hardware level commits to persistent memory, the 
transaction should be consumed transparently by the hardware, and does not need 
to generate an event at the target. This will require an additional event 
definition in libfabric (See definition for fi_eq_commit_entry)
  *   A new EQ event definition (fi_eq_commit_entry) to support 
software-emulated persistence for devices that cannot provide hardware support
     *   The iov, and count variables mirror the original iov, and count 
contents of the originating request.
     *   The flags may be a diminished set of flags from the original 
transaction under the assumption that only some flags would have meaning at the 
target and sending originator-only flags to the target would have little value 
to the target process.
  *   Additional flags or capabilities
     *   A provider should be able to indicate whether they support software 
emulated notifications of fi_commit, or whether they can handle hardware 
requests for commits to persistent memory
        *   An additional flag should be introduced to the fi_info structure 
under modes: FI_COMMIT_MANUAL (or something else)
           *   This flag would indicate to the application that events may be 
generated to the event queue for consumption by the application. Commit events 
would be generated upon receipt of a commit message from a remote peer, and the 
application would be responsible for handling the event.
           *   Lack of the FI_COMMIT_MANUAL flag, and the presence of the 
FI_RMA_PMEM (or FI_PMEM) flag in the info structure should imply that the 
hardware is capable of handling the commit requests to persistent memory and 
the application does not need to read the event queue for commit events.
  *   Change of flag definition
     *   The FI_RMA_PMEM flag should be changed to FI_PMEM to indicate that the 
provider is PMEM aware, and supports RMA/AMO/MSG operations to and from 
persistent memory.
     *   There may be little value in supporting messaging interfaces, but it 
is something that could supported.
  *   Addition of an event handler registration for handling event queue 
entries within the provider context (See Definition: fi_eq_event_handler)
     *   Essentially, this becomes a registered callback for the target 
application to handle specific event types. We can use this mechanism with the 
target application to allow the provider to handle events internally using a 
function provided by the application. The function would contain the logic 
necessary to handle the event
     *   Specific to PMEM, a function handler would be used by the target 
application to handle commits to persistent memory as they were delivered 
without requiring a fi_eq_read and some form of acknowledgement around the 
commit action. With the handler, the commit could be handled entirely by the 
function provided by the application, and the return code from the application 
provided call-back would be sufficient for a software emulation in the provider 
to produce the return message to the sender that the commit transaction is 
fully complete. The use of a handler allows us to make the commit transaction 
as light-weight, or heavy-weight as necessary.
Definitions:
fi_commit
ssize_t fi_commit(struct fid_ep *ep,
                             const struct fi_rma_iov *iov,
                             size_t count,
                             fi_addr_t dest_addr,
                             uint64_t flags,
                             void *context);
fi_eq_commit_entry
struct fi_eq_commit_entry {
    fid_t                       fid;            /* fid associated with request 
*/
    const struct fi_rma_iov    *iov;            /* iovec of memory regions to 
be committed to persistent memory */
    size_t                      count;          /* number of iovec/key entries 
*/
    uint64_t                    flags;          /* operation-specific flags */
};
fi_eq_event_handler
typedef ssize_t (*fi_eq_event_handler_t)(struct fid_eq *eq,
    uint64_t event_type,
    void *event_data,
    uint64_t len,
    void *context);
ssize_t fi_eq_register_handler(struct fid_eq *eq,
    uint64_t event_type,
    fi_eq_event_handler_t handler,
    void *context);
Use cases supported by this proposal:

  *   As an application writer, I need to commit multiple previously-sent data 
transfers to the persistence domain
     *   Previous functionality allows for a commit for every message as is the 
case for FI_COMMIT_COMPLETE, or the use of FI_COMMIT on a per-transaction 
basis. The need in this use case is performance-oriented, to allow less strict 
delivery model to the NIC for most messages followed up with a 'flush' of the 
NIC to the persistence domain. This allows most messages targeted to the 
persistence domain to complete with a less strict delivery model, and provides 
a mechanism to ensure that those data transfers are eventually persisted.
  *   As an application writer, I would like to be able to support persistent 
data models with libfabric over providers that do not provide hardware support 
for persistent memory devices
     *   The GNI provider, and other providers won't be able to support PMEM 
use cases, or at least not right away. To provide the support for PMEM in 
prototypes, or in providers that will never have PMEM support, a 
software-emulated approach was suggested to bridge the gap in functionality. In 
order for the target to know that something needs to be flushed to the 
persistence domain, the new EQ event was created. In addition to the EQ event, 
it was discussed that it could be useful for applications to provide function 
handlers that could be called in the event that a EQ would be delivered to 
facilitate a more passive libfabric application. If a handler was provided to 
libfabric, then the application itself could focus on serving requests for 
access to the persistence domain and sharing of the persistent memory.


James Swaro Software Engineer  | Cray, a Hewlett Packard Enterprise company
2131 Lindau Lane, Suite 1000 | Bloomington, MN 55425
+1-651-605-9000  [email protected]<mailto:[email protected]>  
www.cray.com<http://www.cray.com>

[signature_1482336499]<https://www.cray.com/>
[signature_663180266]<https://twitter.com/cray_inc>  [signature_784264875] 
<https://www.youtube.com/channel/UCS483ZExauoVgpG8dLn5p1w>   
[signature_1056062436] <https://www.linkedin.com/company/cray-inc-/>

_______________________________________________
ofiwg mailing list
[email protected]
https://lists.openfabrics.org/mailman/listinfo/ofiwg

[ofiwg] Proposal for enhancement to support additional Persistent Memory use cases (ofiwg/libfabric#5874)

Reply via email to