Re: [Pvfs2-developers] encoding negative responses

Phil Carns Fri, 10 Nov 2006 07:31:21 -0800

I have done some more work on this, but unfortunately I don't have apatch that I can share yet (it might be faster for you guys to just makethe change if you want it; I can point out one little gotcha that Ifound in sys-rename.sm but the rest is simple).

As far as I can tell after a good bit of testing with the modificationin place, there are no _current_ operations that rely on decoding fieldsfrom negative responses.

Sam brought up a hypothetical scenario where you might want this ability(an O_EXCL create returns a handle on failure), but I would vote formaking that a special case if it happens rather than having all negativeresponses get fully encoded by default. We can always add a check tolet things like that drop through. Alternatively for that kind ofscenario we could send a postitive response, and just have a flag in theresponse structure that indicates what really happened (ie, I didn'tactually do the creation, but here is your handle anyway) rather thanusing a negative error code for that purpose.


-Phil

Rob Ross wrote:

Did we reach any sort of consensus on this idea?

Rob

Phil Carns wrote:
We've run into a couple of scenarios lately where a broken metadatafile can cause the server to crash. The latest one looks like this:
------------------
- a metadata file exists, but the db entry for its datafile handles(the "dh" key is missing for some reason
- a client requests a getattr on the file
- the server reads the basic attributes successfully, indicating thatthere is a dfile array of size x and a distribution of size y
- the server tries to read the dfile array but fails
- at this point the dfile array pointer in the attributes structure isnon-zero but has garbage in it
- at this point the distribution pointer in the attributes structureis NULL
- the server state machine follows a failure path to send a responsewith an error status
- the encoder tries to encode the response, but segfaults because itthinks the distribution exists (the mask is set and the size is set toy) when really it is a NULL pointer
------------------
We've plugged some similar bugs before by just fixing the specificcase (being more careful about cleaning up masks, pointers, sizes,etc. after an error), and the same could easily be done here. Shouldthere be a more general solution, though?
What I am wondering is if the response encoder should even bother toencode the whole message if the status is non-zero. It seems like ifthere is an error code it should just stick to encoding the basicPVFS_server_resp struct. The rest of the fields can't really betrusted since we hit an error condition. Likewise on the decode sideof things.
The lebf_encode_resp() function already sort of catches one case of this:
/* we stand a good chance of segfaulting if we try to encode theresponse
     * after something bad happened reading data from disk. */
    if (resp->status != -PVFS_EIO)
    {

... but that only handles one specific error code.
I don't think it would be a real big change to do this in a moregeneral manner. The biggest problem is that it throws off the sizechecking logic that checks buffer sizes after encoding, but thatshouldn't be hard to adjust.
Any thoughts? Is this the right approach, or should we continue toencode/decode all of the request-specific regardless of error?
-Phil

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] encoding negative responses

Reply via email to