Re: [ofiwg] [libfabric-users] Two-stage completion

2016-09-16 Thread Smith, Jonathan D
Hi Sean!

Thanks for your help so far. I think I'm getting somewhere! I have some 
responses and more details for you:

> those flags only apply to the fi_sendmsg call.  Other send operations do not 
> take flags, and they do not apply to the fi_recvmsg call.

Understood. I'm using connectionless OFI with tags, so the send call I've been 
using is fi_tsendmsg. I'm using the flags to specify what kind of completion I 
want.

> Libfabric does not define blocking operations.  All operations are 
> asynchronous.

Ah, I understand that, but I can see how what I said would be ambiguous.

It might help if I explain that I'm providing a wrapper layer on OFI. It must 
manage multiple fabric providers in parallel, and provide an interface that 
includes a send function that could, in the face of a network failure, use 
alternative providers. I would like *that* send function to be a blocking call, 
which I have been implementing in prototype using the completion queue events.

I would also like to support message timeout so I know to try to send the 
message with a different provider. 

This is what led me to the idea that I needed two completion events.

> You need to clarify what it means for the destination to receive the message. 
>  Is the destination the peer process?  Peer node?  Peer NIC?

The aforementioned timeout is the reason I need some kind of ACK. I need 
verification that the peer process, if it were to post a receive buffer, would 
get the message (my understanding is that if I post a send, and then much later 
the peer posts a recv buffer, that's ok, since the providers have some notion 
of a queue of receives that haven't been given to the process yet). This may 
mean that the message got to the peer NIC, but I'm not sure.

>The inject and tagged message calls will work with unconnected (RDM - reliable 
>datagram) endpoints.

Great! That helps a lot. It might be wise to reword the man page section on the 
inject and send calls, because send claims it only works with connected 
endpoints, and inject claims it's an optimized version of send.

I'll end on a single question:
If I start using the inject call to block until the buffer is safe, how do I 
get the kind of completion I'd need for my timeout, if there's no flags 
argument in the fi_tinject call?

Thanks!
Jonathan

-Original Message-
From: Hefty, Sean 
Sent: Thursday, September 15, 2016 1:18 PM
To: Smith, Jonathan D ; Jeff Hammond 

Cc: libfabric-us...@lists.openfabrics.org; ofiwg@lists.openfabrics.org
Subject: RE: [libfabric-users] Two-stage completion

> ...oh? I thought that FI_TRANSMIT_COMPLETE was the local completion 
> and FI_DELIVERY_COMPLETE was the remote completion. What does this 
> mean, then?
> 
> FI_TRANSMIT_COMPLETE
>   Applies to fi_sendmsg. Indicates that a completion should not be 
> generated until the operation has been successfully transmitted and is 
> no longer being tracked by the provider.
> FI_DELIVERY_COMPLETE
>   Applies to fi_sendmsg. Indicates that a completion should be 
> generated when the operation has been processed by the destination.

You looking at the flags discussion for the send/receive operations.  This is 
calling out that those flags only apply to the fi_sendmsg call.  Other send 
operations do no take flags, and they do not apply to the fi_recvmsg call.

These flags can set the completion model for a specific send operation.  The 
documentation here does not try to re-state the full meaning of the completion 
mode, however.

> Anyway, I realize that I'm trying to have my cake and eat it too, but 
> in general, I'm looking for:
>   1. Blocking send semantics over unconnected endpoints

Libfabric does not define blocking operations.  All operations are asynchronous.

>   2. You get to send again as soon as the buffer is safe to write to 
> (currently my use case for the cq),

The buffer may be re-used either after you get a completion or immediately if 
an inject call is used.

>   3. You also get some kind of event when we're sure the destination 
> received the event,

You need to clarify what it means for the destination to receive the message.  
Is the destination the peer process?  Peer node?  Peer NIC?

>   4. The application doesn't perform extra copy operations on the 
> message unless it's completely unavoidable.

Well, what exactly is an 'extra copy'?  :)

The API does not dictate how a provider implements the various completion 
semantics.  Providers may copy the message buffers into outbound message 
queues, some internal buffer, or whatnot.  You just hope that the provider 
selects the best performing option.

> This doesn't mean "return from the send call with the buffer safe 
> ASAP," as in that case I would just use the memcpy strategy.
> 
> So is fi_tinject with FI_INJECT_COMPLETE what I want? It seems that 
> that's probably the case, but I do need unconnected endpoints. Since 
> the man pages say 

Re: [ofiwg] [libfabric-users] Two-stage completion

2016-09-16 Thread Smith, Jonathan D
And then what completion mode should I select? FI_DELIVERY_COMPLETE?

-Original Message-
From: Hefty, Sean 
Sent: Thursday, September 15, 2016 3:36 PM
To: Smith, Jonathan D ; Jeff Hammond 

Cc: libfabric-us...@lists.openfabrics.org; ofiwg@lists.openfabrics.org
Subject: RE: [libfabric-users] Two-stage completion

> If I start using the inject call to block until the buffer is safe, 
> how do I get the kind of completion I'd need for my timeout, if 
> there's no flags argument in the fi_tinject call?

To reuse the buffer immediately but still get a completion, you should call 
fi_tsendmsg with the FI_INJECT flag.  The default completion mode will then be 
used.
___
ofiwg mailing list
ofiwg@lists.openfabrics.org
http://lists.openfabrics.org/mailman/listinfo/ofiwg


Re: [ofiwg] [libfabric-users] Two-stage completion

2016-09-16 Thread Hefty, Sean
> I'm not sure that using fi_tsendmsg with the FI_INJECT flag would meet
> Jonathan's requirements - if I'm reading this email chain correctly.
> I'll reorder his requirements and add comments:
> 
> > 4. The application doesn't perform extra copy operations on the
> message unless it's completely unavoidable.
> 
> You also don't want the *provider* to do any internal memory copies. In
> my mind, using FI_INJECT means "release ownership of the source buffer
> and return from this function ASAP". Maybe the provider doesn't have
> hardware support for this and needs to memcpy, or maybe the hardware is
> busy and the provider needs to memcpy. Also, maybe the provider doesn't
> support a large enough "inject size" transmit attribute - the
> application would have to memcpy. Or maybe the provider implements an
> arbitrarily large "inject size" using and internal memcpy?
> 
> > 1. Blocking send semantics over unconnected endpoints
> 
> This is a blocking send from the perspective of his wrapper library,
> right?  The library  implementation would be to spin on fi_cq_read()
> until a FI_INJECT_COMPLETE event and then return to the application.

We need to be very clear what semantic 'blocking send' needs to convey.  
Sockets does blocking send calls, but that doesn't mean anything regarding the 
location of the message when the call returns.  If the desired semantic is that 
the buffer may be immediately re-used/freed/modified, then the FI_INJECT flag 
is the correct mapping.

The API does not dictate an implementation, and IMO apps should focus on 
performance requirements, not implementation details that they may believe lead 
to lower performance. 

> > 2. You get to send again as soon as the buffer is safe to write
> to (currently my use case for the cq),
> 
> Same as waiting for FI_INJECT_COMPLETE event.
> 
> > 3. You also get some kind of event when we're sure the
> destination received the event
> 
> This is just FI_DELIVERY_COMPLETE|FI_TRANSMIT_COMPELTE
> 
> The real question is if two events can be generated for one operation.
> The documentation for the FI_*_COMPLETE flags in the fi_endpoint man
> page all start like this:
> 
>   "Indicates that a completion should be generated when ..."
> 
> "... a completion ..." sounds like there could be more than one
> completion event. It should say "... the completion ..." if only one
> event can be generated for each operation.

We want a single completion per operation.  Mere humans need to write to this 
API.

> Earlier, Sean said "The timing difference between the two completion
> models seems minimal, especially compared to the time needed to
> generate, transfer, and process an additional ack." But, this is
> comparing FI_TRANSMIT_COMPLETE and FI_DELIVERY_COMPLETE.  The timing
> difference is quite noticeable between FI_INJECT_COMPLETE (local
> operation) and FI_TRANSMIT_COMPLETE|FI_DELIVERY_COMPLETE (remote
> operation). As Jeff Hammond pointed out, several different middleware
> could use - needs - this distinction for best performance.

I was comparing transmit versus delivery complete because those were the 
completion modes called out in the previous emails.

But if you want to go further and compare say inject versus delivery complete, 
I'm still not convinced that multiple completions are beneficial.

When you consider reliability, the source data needs to remain untouched until 
it has been acked by the remote side.  This would seem to imply that either the 
source data is copied or that "local completion" == "remote completion".  In 
the former case, source data may be copied to another memory buffer, the 
transmit queue, or NIC memory.  FI_INJECT covers this case.

- Sean
___
ofiwg mailing list
ofiwg@lists.openfabrics.org
http://lists.openfabrics.org/mailman/listinfo/ofiwg