Thanks for the feedback. I'll incorporate the feedback and open a proposal PR.

Ian

-----Original Message-----
From: Hefty, Sean <[email protected]> 
Sent: Monday, May 18, 2020 3:04 PM
To: Ziemba, Ian <[email protected]>; [email protected]
Subject: RE: API for Libfabric Overrides

> The user would register the override structure to a fid using a new 
> fi_set_ops function. The following is an example of this.
> 
> 
> 
> enum fi_set_ops {
> 
>        FI_OVERRIDE_OPS,
> 
> };
> 
> 
> 
> struct fi_ops {
> 
>        size_t size;
> 
>        int    (*close)(struct fid *fid);
> 
>        int    (*bind)(struct fid *fid, struct fid *bfid, uint64_t flags);
> 
>        int    (*control)(struct fid *fid, int command, void *arg);
> 
>        int    (*ops_open)(struct fid *fid, const char *name,
> 
>                          uint64_t flags, void **ops, void *context);
> 
>        int    (*tostr)(const struct fid *fid, char *buf, size_t len);
> 
>        int    (*ops_set)(struct fid *fid, enum fi_set_ops ops_type, void *ops,
> 
>                         uint64_t flags);

The ops_open call uses a name to identify the ops.  This is more generic, as it 
allows provider specific operations to be set with less chance of conflict.  
Version data is conveyed by the name and embedded ops size field.

Passing in a context parameter provides extra extensibility, as that parameter 
can be used for other purposes based on flags.


> fi_set_ops(struct fid *fid, enum fi_set_ops ops_type, void *ops, 
> uint64_t flags)
> 
> {
> 
>        if (!fid->ops->ops_set)
> 
>               return -FI_ENOSYS;

The FI_CHECK_OP macro properly performs this check.


> The main issue with this approach is how to provide meaningful 
> feedback to the user if a provider does not support a requested 
> override. I do not think the above approach I presented can accomplish 
> this. Instead, I think minor adjustments could be made to provide users this 
> feedback. Here is a second approach to address overrides.

Providers can return ENOSYS if the requested support is not available.  See 
below.


> All override operations are defined in a union instead of a structure.
> 
> 
> 
> union fi_override_op {
> 
> 
> 
>        /* Copy from a host destination buffer to an HMEM iov. */
> 
>        ssize_t (*copy_from_hmem_iov)(void *dest, size_t size,
> 
>                                  const struct iovec *hmem_iov,
> 
>                                  enum fi_hmem_iface hmem_iface,
> 
>                                  size_t hmem_iov_count,
> 
>                                  uint64_t hmem_iov_offset);
> 
> 
> 
>        /* Copy from an HMEM iov to a host source buffer. */
> 
>        ssize_t (*copy_to_hmem_iov)(const struct iovec *hmem_iov,
> 
>                                 enum fi_hmem_iface hmem_iface,
> 
>                                size_t hmem_iov_count,
> 
>                                uint64_t hmem_iov_offset, void *src,
> 
>                                size_t size);
> 
> };
> 
> 
> 
> Instead of having an enum value used to define an override structure, 
> an enum value would be defined for each override.

This is equivalent to limiting each set ops call to overriding a single 
function.  There's no need to define a union, for example.

Overrides should be all or nothing.  If an individual function can be set 
separate from other functions, it should go into its own ops structure.  If a 
group of functions are set together, then those functions should be in a single 
ops structure.  This isn't a problem with the set_ops API, but the definition 
of the ops structure(s).

Separately, we need to decide if applications should call set_ops once for each 
function, or if in practice, all will be set, and only 1 call is needed.  Both 
options can even be supported at the cost of increased development of 
maintenance.


> A user would have to call fi_set_op for each operation they wish to 
> override. This would allow a provider to provide per override support 
> information back to the user. If a provider does not support a given 
> override or overrides in general, -FI_ENOSYS would be returned. Users should 
> not treat -FI_ENOSYS as a fatal error.

ENOSYS is an easy case, and likely all we need up front.  But, ultimately, the 
behavior (returned value/data) of the fi_set_ops call depends on the full 
identity of the ops being set.  So, we have the flexibility to return different 
types of errors in the future if needed.  For instance, if the ops size is 
returned, that can be used to report that the provider was only able to 
override some of the functions because it recognized an older version of the 
structure.


- Sean
_______________________________________________
ofiwg mailing list
[email protected]
https://lists.openfabrics.org/mailman/listinfo/ofiwg

Reply via email to