[EMAIL PROTECTED] wrote on Thu, 15 Jun 2006 13:55 -0400:
> I am having problems running dbench, from one client to one server
> through the VFS. Could be an older change that broke this, maybe up
> to a couple of weeks. Some of the "/bin/rm -rf" processes in the
> cleanup phase hang in D wait state, pretty consistently. Once
> managed to get the "busy inodes after unmount" complaint, and
> "pvfs2_kill_sb: (WARNING) number of inode allocs (270) != number of
> inode deallocs (267)" too. I'll try to narrow it down a bit.
The proximate cause of this is a coredump in pvfs2-client-core
resulting from a bad pointer dereference in lebf_decode_rel.
This gets called from msgpairarray.sm to release any allocated
resources that came about during decoding the response from the
server.
The server returned -1073741828 "Input/output error", which is
what I'll go look at next, but the client should not crash when
it gets this error.
The code path is src/client/sysint/msgpairarray.sm:523 and it hasn't
changed substantially for a few months. I've annotated what's
happening below:
ret = PINT_serv_decode_resp(msg_p->fs_id,
msg_p->encoded_resp_p,
&decoded_resp,
&msg_p->svr_addr,
msg_p->recv_status.actual_size,
&resp_p);
if (ret != 0)
{
PVFS_perror_gossip("msgpairarray decode error", ret);
msg_p->op_status = ret;
}
else
{
/* if we've made it this far, the server response status is
* meaningful, so we save it.
*/
msg_p->op_status = resp_p->status;
}
** ret was 0, but resp_p->status = -1073741828
/* NOTE: we call the function associated with each message,
* not just the one from the first array element. so
* there could in theory be different functions for each
* message (to handle different types of messages all in
* the same array).
*/
if (msg_p->comp_fn != NULL)
{
/* If we call the completion function, store the result on
* a per message pair basis. Also store some non-zero
* (failure) value in js_p->error_code if we see one.
*/
msg_p->op_status = msg_p->comp_fn(sm_p, resp_p, i);
if (msg_p->op_status != 0)
{
js_p->error_code = msg_p->op_status;
}
** it calls the completion function, lookup_segment_lookup_comp_fn,
that notices the negative resp_p->status, and just returns that
error code.
/* even if we see a failure, continue to process with the
* completion function. -- RobR
*/
}
else if (resp_p->status != 0)
{
** we do not take this code path (because we had a completion function),
thus fall through to ...
/* no comp_fn specified and status non-zero */
gossip_debug(GOSSIP_MSGPAIR_DEBUG,
"notice: msgpairarray_complete: error %d "
"from server %d\n", resp_p->status, i);
/* save a non-zero status to return if we see one */
js_p->error_code = resp_p->status;
/* If we don't have a completion function, there is no point
* in continuing to process after seeing a failure.
*/
if (js_p->error_code)
{
break;
}
}
** .. here, that leads to a coredump looking up uninitialized values
in decoded_resp for the lookup_path case.
/* free all the resources that we used to send and receive. */
ret = PINT_serv_free_msgpair_resources(
&msg_p->encoded_req, msg_p->encoded_resp_p, &decoded_resp,
&msg_p->svr_addr, msg_p->max_resp_sz);
Why the "else" there? Shouldn't we break as soon as this error
happens? The RobR comment has me confused.
-- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers