I have attached a patch that implements the proposed bug fixes. With
this applies, the encoder does not bother to encode or decode any
trailing data for error responses. We haven't seen any ill effects,
although a minor adjustment was needed to the sys-rename.sm state
machine which modifying the response structure after decoding.
-Phil
Phil Carns wrote:
I have done some more work on this, but unfortunately I don't have a
patch that I can share yet (it might be faster for you guys to just make
the change if you want it; I can point out one little gotcha that I
found in sys-rename.sm but the rest is simple).
As far as I can tell after a good bit of testing with the modification
in place, there are no _current_ operations that rely on decoding fields
from negative responses.
Sam brought up a hypothetical scenario where you might want this ability
(an O_EXCL create returns a handle on failure), but I would vote for
making that a special case if it happens rather than having all negative
responses get fully encoded by default. We can always add a check to
let things like that drop through. Alternatively for that kind of
scenario we could send a postitive response, and just have a flag in the
response structure that indicates what really happened (ie, I didn't
actually do the creation, but here is your handle anyway) rather than
using a negative error code for that purpose.
-Phil
Rob Ross wrote:
Did we reach any sort of consensus on this idea?
Rob
Phil Carns wrote:
We've run into a couple of scenarios lately where a broken metadata
file can cause the server to crash. The latest one looks like this:
------------------
- a metadata file exists, but the db entry for its datafile handles
(the "dh" key is missing for some reason
- a client requests a getattr on the file
- the server reads the basic attributes successfully, indicating that
there is a dfile array of size x and a distribution of size y
- the server tries to read the dfile array but fails
- at this point the dfile array pointer in the attributes structure
is non-zero but has garbage in it
- at this point the distribution pointer in the attributes structure
is NULL
- the server state machine follows a failure path to send a response
with an error status
- the encoder tries to encode the response, but segfaults because it
thinks the distribution exists (the mask is set and the size is set
to y) when really it is a NULL pointer
------------------
We've plugged some similar bugs before by just fixing the specific
case (being more careful about cleaning up masks, pointers, sizes,
etc. after an error), and the same could easily be done here. Should
there be a more general solution, though?
What I am wondering is if the response encoder should even bother to
encode the whole message if the status is non-zero. It seems like if
there is an error code it should just stick to encoding the basic
PVFS_server_resp struct. The rest of the fields can't really be
trusted since we hit an error condition. Likewise on the decode
side of things.
The lebf_encode_resp() function already sort of catches one case of
this:
/* we stand a good chance of segfaulting if we try to encode the
response
* after something bad happened reading data from disk. */
if (resp->status != -PVFS_EIO)
{
... but that only handles one specific error code.
I don't think it would be a real big change to do this in a more
general manner. The biggest problem is that it throws off the size
checking logic that checks buffer sizes after encoding, but that
shouldn't be hard to adjust.
Any thoughts? Is this the right approach, or should we continue to
encode/decode all of the request-specific regardless of error?
-Phil
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
Index: pvfs2_src/src/proto/PINT-le-bytefield.c
===================================================================
--- pvfs2_src/src/proto/PINT-le-bytefield.c (revision 2774)
+++ pvfs2_src/src/proto/PINT-le-bytefield.c (revision 2775)
@@ -423,7 +423,7 @@
/* we stand a good chance of segfaulting if we try to encode the response
* after something bad happened reading data from disk. */
- if (resp->status != -PVFS_EIO)
+ if (resp->status == 0)
{
/* extra encoding rules for particular responses */
@@ -606,7 +606,7 @@
decode_PVFS_server_resp(p, resp);
gossip_debug(GOSSIP_ENDECODE_DEBUG,"lebf_decode_resp\n");
- if (resp->status == -PVFS_EIO)
+ if (resp->status != 0)
goto out;
#define CASE(tag,var) \
@@ -778,87 +778,90 @@
}
} else if (input_type == PINT_DECODE_RESP) {
struct PVFS_server_resp *resp = &msg->stub_dec.resp;
- switch (resp->op) {
+ if(resp->status == 0)
+ {
+ switch (resp->op) {
- case PVFS_SERV_LOOKUP_PATH:
- {
- struct PVFS_servresp_lookup_path *lookup =
- &resp->u.lookup_path;
- decode_free(lookup->handle_array);
- decode_free(lookup->attr_array);
- break;
- }
+ case PVFS_SERV_LOOKUP_PATH:
+ {
+ struct PVFS_servresp_lookup_path *lookup =
+ &resp->u.lookup_path;
+ decode_free(lookup->handle_array);
+ decode_free(lookup->attr_array);
+ break;
+ }
- case PVFS_SERV_READDIR:
- decode_free(resp->u.readdir.dirent_array);
- break;
+ case PVFS_SERV_READDIR:
+ decode_free(resp->u.readdir.dirent_array);
+ break;
- case PVFS_SERV_MGMT_PERF_MON:
- decode_free(resp->u.mgmt_perf_mon.perf_array);
- break;
+ case PVFS_SERV_MGMT_PERF_MON:
+ decode_free(resp->u.mgmt_perf_mon.perf_array);
+ break;
- case PVFS_SERV_MGMT_ITERATE_HANDLES:
- decode_free(resp->u.mgmt_iterate_handles.handle_array);
- break;
+ case PVFS_SERV_MGMT_ITERATE_HANDLES:
+ decode_free(resp->u.mgmt_iterate_handles.handle_array);
+ break;
- case PVFS_SERV_MGMT_DSPACE_INFO_LIST:
- decode_free(resp->u.mgmt_dspace_info_list.dspace_info_array);
- break;
+ case PVFS_SERV_MGMT_DSPACE_INFO_LIST:
+ decode_free(resp->u.mgmt_dspace_info_list.dspace_info_array);
+ break;
- case PVFS_SERV_GETATTR:
- if (resp->u.getattr.attr.mask & PVFS_ATTR_META_DIST)
- decode_free(resp->u.getattr.attr.u.meta.dist);
- if (resp->u.getattr.attr.mask & PVFS_ATTR_META_DFILES)
- decode_free(resp->u.getattr.attr.u.meta.dfile_array);
- break;
+ case PVFS_SERV_GETATTR:
+ if (resp->u.getattr.attr.mask & PVFS_ATTR_META_DIST)
+ decode_free(resp->u.getattr.attr.u.meta.dist);
+ if (resp->u.getattr.attr.mask & PVFS_ATTR_META_DFILES)
+ decode_free(resp->u.getattr.attr.u.meta.dfile_array);
+ break;
- case PVFS_SERV_MGMT_EVENT_MON:
- decode_free(resp->u.mgmt_event_mon.event_array);
- break;
+ case PVFS_SERV_MGMT_EVENT_MON:
+ decode_free(resp->u.mgmt_event_mon.event_array);
+ break;
- case PVFS_SERV_GETEATTR:
- /* need a loop here? WBL */
- if (resp->u.geteattr.val)
- decode_free(resp->u.geteattr.val);
- break;
- case PVFS_SERV_LISTEATTR:
- if (resp->u.listeattr.key)
- decode_free(resp->u.listeattr.key);
- break;
+ case PVFS_SERV_GETEATTR:
+ /* need a loop here? WBL */
+ if (resp->u.geteattr.val)
+ decode_free(resp->u.geteattr.val);
+ break;
+ case PVFS_SERV_LISTEATTR:
+ if (resp->u.listeattr.key)
+ decode_free(resp->u.listeattr.key);
+ break;
- case PVFS_SERV_GETCONFIG:
- case PVFS_SERV_CREATE:
- case PVFS_SERV_REMOVE:
- case PVFS_SERV_MGMT_REMOVE_OBJECT:
- case PVFS_SERV_MGMT_REMOVE_DIRENT:
- case PVFS_SERV_MGMT_GET_DIRDATA_HANDLE:
- case PVFS_SERV_IO:
- case PVFS_SERV_SMALL_IO:
- case PVFS_SERV_SETATTR:
- case PVFS_SERV_SETEATTR:
- case PVFS_SERV_DELEATTR:
- case PVFS_SERV_CRDIRENT:
- case PVFS_SERV_RMDIRENT:
- case PVFS_SERV_CHDIRENT:
- case PVFS_SERV_TRUNCATE:
- case PVFS_SERV_MKDIR:
- case PVFS_SERV_FLUSH:
- case PVFS_SERV_MGMT_SETPARAM:
- case PVFS_SERV_MGMT_NOOP:
- case PVFS_SERV_STATFS:
- case PVFS_SERV_WRITE_COMPLETION:
- case PVFS_SERV_PROTO_ERROR:
- /* nothing to free */
- break;
+ case PVFS_SERV_GETCONFIG:
+ case PVFS_SERV_CREATE:
+ case PVFS_SERV_REMOVE:
+ case PVFS_SERV_MGMT_REMOVE_OBJECT:
+ case PVFS_SERV_MGMT_REMOVE_DIRENT:
+ case PVFS_SERV_MGMT_GET_DIRDATA_HANDLE:
+ case PVFS_SERV_IO:
+ case PVFS_SERV_SMALL_IO:
+ case PVFS_SERV_SETATTR:
+ case PVFS_SERV_SETEATTR:
+ case PVFS_SERV_DELEATTR:
+ case PVFS_SERV_CRDIRENT:
+ case PVFS_SERV_RMDIRENT:
+ case PVFS_SERV_CHDIRENT:
+ case PVFS_SERV_TRUNCATE:
+ case PVFS_SERV_MKDIR:
+ case PVFS_SERV_FLUSH:
+ case PVFS_SERV_MGMT_SETPARAM:
+ case PVFS_SERV_MGMT_NOOP:
+ case PVFS_SERV_STATFS:
+ case PVFS_SERV_WRITE_COMPLETION:
+ case PVFS_SERV_PROTO_ERROR:
+ /* nothing to free */
+ break;
- case PVFS_SERV_INVALID:
- case PVFS_SERV_PERF_UPDATE:
- case PVFS_SERV_JOB_TIMER:
- case PVFS_SERV_TROVE_SYNC_TIMER:
- gossip_lerr("%s: invalid response operation %d.\n",
- __func__, resp->op);
- break;
- }
+ case PVFS_SERV_INVALID:
+ case PVFS_SERV_PERF_UPDATE:
+ case PVFS_SERV_JOB_TIMER:
+ case PVFS_SERV_TROVE_SYNC_TIMER:
+ gossip_lerr("%s: invalid response operation %d.\n",
+ __func__, resp->op);
+ break;
+ }
+ }
}
}
Index: pvfs2_src/src/server/pvfs2-server.h
===================================================================
--- pvfs2_src/src/server/pvfs2-server.h (revision 2774)
+++ pvfs2_src/src/server/pvfs2-server.h (revision 2775)
@@ -313,6 +313,7 @@
PVFS_error* err_array;
PVFS_ds_keyval_handle_info keyval_handle_info;
PVFS_handle dirent_handle;
+ PVFS_ds_keyval dist_val;
};
/* this is used in both set_eattr, get_eattr and list_eattr */
Index: pvfs2_src/src/client/sysint/sys-rename.sm
===================================================================
--- pvfs2_src/src/client/sysint/sys-rename.sm (revision 2774)
+++ pvfs2_src/src/client/sysint/sys-rename.sm (revision 2775)
@@ -434,7 +434,13 @@
the target entry does not already exist, so it's
expected and is not an error.
*/
+ /* note: don't change the actual resp structure, because the
+ * decoder counts on this to know how to release data structures
+ */
+#if 0
resp_p->status = 0;
+#endif
+ return(0);
}
return resp_p->status;
}
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers