Re: [Lustre-devel] RE: More LND: error handling

Scott Atchley Mon, 11 Dec 2006 04:23:22 -0800

On Dec 11, 2006, at 6:06 AM, Eric Barton wrote:

John,

Does that sound roughly right?  Anything else I should be taking
into account?


The guiding principles for completion are...

1. If you return success from lnd_send or lnd_recv, you must call
   lnet_finalize() within finite time.

2. You may only call lnet_finalize() when there is no longer any
   chance that the underlying network can touch (read or write) the
   payload buffer.

3. The completion status on sends isn't critical.  Lustre only really
   needs to know that sending is over; knowing whether the send was
   good or not is really just icing on the cake (e.g. so that it
   doens't have to wait for a full timeout for an RPC reply if sending
   the request failed).

4. The completion status on receives is completely critical.  You may
   only return success if the sink buffer has been filled correctly.

                Cheers,
                        Eric


Two other comments:

1) Do not hold any locks when calling any lnet_ functions.

2) Make sure you are _completely_ done with your buffer beforecalling lnet_finalize(). I ran into a race condition where I calledlnet_finalize() then placed the rx or tx descriptor on my idlequeue. :-)


_______________________________________________
Lustre-devel mailing list
Lustre-devel@clusterfs.com
https://mail.clusterfs.com/mailman/listinfo/lustre-devel

Re: [Lustre-devel] RE: More LND: error handling

Reply via email to