[EMAIL PROTECTED] wrote on Thu, 15 Jun 2006 13:55 -0400:
> I am having problems running dbench, from one client to one server
> through the VFS.  Could be an older change that broke this, maybe up
> to a couple of weeks.  Some of the "/bin/rm -rf" processes in the
> cleanup phase hang in D wait state, pretty consistently.  Once
> managed to get the "busy inodes after unmount" complaint, and
> "pvfs2_kill_sb: (WARNING) number of inode allocs (270) != number of
> inode deallocs (267)" too.  I'll try to narrow it down a bit.

The proximate cause of this is a coredump in pvfs2-client-core
resulting from a bad pointer dereference in lebf_decode_rel.

This gets called from msgpairarray.sm to release any allocated
resources that came about during decoding the response from the
server.

The server returned -1073741828 "Input/output error", which is
what I'll go look at next, but the client should not crash when
it gets this error.

The code path is src/client/sysint/msgpairarray.sm:523 and it hasn't
changed substantially for a few months.  I've annotated what's
happening below:

        ret = PINT_serv_decode_resp(msg_p->fs_id,
                                    msg_p->encoded_resp_p,
                                    &decoded_resp,
                                    &msg_p->svr_addr,
                                    msg_p->recv_status.actual_size,
                                    &resp_p);
        if (ret != 0)
        {
            PVFS_perror_gossip("msgpairarray decode error", ret);
            msg_p->op_status = ret;
        }
        else 
        {
            /* if we've made it this far, the server response status is
             * meaningful, so we save it.
             */
            msg_p->op_status = resp_p->status;
        }

** ret was 0, but resp_p->status = -1073741828

        /* NOTE: we call the function associated with each message,
         *       not just the one from the first array element.  so
         *       there could in theory be different functions for each
         *       message (to handle different types of messages all in
         *       the same array).
         */
        if (msg_p->comp_fn != NULL)
        {
            /* If we call the completion function, store the result on
             * a per message pair basis.  Also store some non-zero
             * (failure) value in js_p->error_code if we see one.
             */
            msg_p->op_status = msg_p->comp_fn(sm_p, resp_p, i);
            if (msg_p->op_status != 0)
            {
                js_p->error_code = msg_p->op_status;
            }

** it calls the completion function, lookup_segment_lookup_comp_fn,
that notices the negative resp_p->status, and just returns that
error code.

            /* even if we see a failure, continue to process with the
             * completion function. -- RobR
             */
        }
        else if (resp_p->status != 0)
        {

** we do not take this code path (because we had a completion function),
thus fall through to ...

            /* no comp_fn specified and status non-zero */
            gossip_debug(GOSSIP_MSGPAIR_DEBUG,
                         "notice: msgpairarray_complete: error %d "
                         "from server %d\n", resp_p->status, i);

            /* save a non-zero status to return if we see one */
            js_p->error_code = resp_p->status;
            
            /* If we don't have a completion function, there is no point
             * in continuing to process after seeing a failure.
             */
            if (js_p->error_code)
            {
                break;
            }
        }

** .. here, that leads to a coredump looking up uninitialized values
in decoded_resp for the lookup_path case.

        /* free all the resources that we used to send and receive. */
        ret = PINT_serv_free_msgpair_resources(
            &msg_p->encoded_req, msg_p->encoded_resp_p, &decoded_resp,
            &msg_p->svr_addr, msg_p->max_resp_sz);



Why the "else" there?  Shouldn't we break as soon as this error
happens?  The RobR comment has me confused.

                -- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to