Hi Pete,

It has been quite some time since I wrote that comment, so I don't recall exactly what I was thinking. I can, however, take a guess...

I think I was concerned that there might be request-specific actions that we might want to take in the event of a failure just as there are in the event of success. So I wanted to make sure that the function was called on failures.

What's actually just as curious to me is that we don't ever call PINT_serv_free_msgpair_resources() in error cases if we *don't* succeed. Does this mean that every msgpairarray failure results in some resources being leaked? Making that "else" block would effectively just skip this code, avoiding the core dump but not freeing the resources.

Perhaps we should have PINT_serv_decode_resp() initialize decoded_resp prior to exit and have PINT_serv_free_msgpair_resources() understand a NULL value for that input parameter?

Rob

Pete Wyckoff wrote:
[EMAIL PROTECTED] wrote on Thu, 15 Jun 2006 13:55 -0400:
I am having problems running dbench, from one client to one server
through the VFS.  Could be an older change that broke this, maybe up
to a couple of weeks.  Some of the "/bin/rm -rf" processes in the
cleanup phase hang in D wait state, pretty consistently.  Once
managed to get the "busy inodes after unmount" complaint, and
"pvfs2_kill_sb: (WARNING) number of inode allocs (270) != number of
inode deallocs (267)" too.  I'll try to narrow it down a bit.

The proximate cause of this is a coredump in pvfs2-client-core
resulting from a bad pointer dereference in lebf_decode_rel.

This gets called from msgpairarray.sm to release any allocated
resources that came about during decoding the response from the
server.

The server returned -1073741828 "Input/output error", which is
what I'll go look at next, but the client should not crash when
it gets this error.

The code path is src/client/sysint/msgpairarray.sm:523 and it hasn't
changed substantially for a few months.  I've annotated what's
happening below:

        ret = PINT_serv_decode_resp(msg_p->fs_id,
                                    msg_p->encoded_resp_p,
                                    &decoded_resp,
                                    &msg_p->svr_addr,
                                    msg_p->recv_status.actual_size,
                                    &resp_p);
        if (ret != 0)
        {
            PVFS_perror_gossip("msgpairarray decode error", ret);
            msg_p->op_status = ret;
        }
else {
            /* if we've made it this far, the server response status is
             * meaningful, so we save it.
             */
            msg_p->op_status = resp_p->status;
        }

** ret was 0, but resp_p->status = -1073741828

        /* NOTE: we call the function associated with each message,
         *       not just the one from the first array element.  so
         *       there could in theory be different functions for each
         *       message (to handle different types of messages all in
         *       the same array).
         */
        if (msg_p->comp_fn != NULL)
        {
            /* If we call the completion function, store the result on
             * a per message pair basis.  Also store some non-zero
             * (failure) value in js_p->error_code if we see one.
             */
            msg_p->op_status = msg_p->comp_fn(sm_p, resp_p, i);
            if (msg_p->op_status != 0)
            {
                js_p->error_code = msg_p->op_status;
            }

** it calls the completion function, lookup_segment_lookup_comp_fn,
that notices the negative resp_p->status, and just returns that
error code.

            /* even if we see a failure, continue to process with the
             * completion function. -- RobR
             */
        }
        else if (resp_p->status != 0)
        {

** we do not take this code path (because we had a completion function),
thus fall through to ...

            /* no comp_fn specified and status non-zero */
            gossip_debug(GOSSIP_MSGPAIR_DEBUG,
                         "notice: msgpairarray_complete: error %d "
                         "from server %d\n", resp_p->status, i);

            /* save a non-zero status to return if we see one */
            js_p->error_code = resp_p->status;
/* If we don't have a completion function, there is no point
             * in continuing to process after seeing a failure.
             */
            if (js_p->error_code)
            {
                break;
            }
        }

** .. here, that leads to a coredump looking up uninitialized values
in decoded_resp for the lookup_path case.

        /* free all the resources that we used to send and receive. */
        ret = PINT_serv_free_msgpair_resources(
            &msg_p->encoded_req, msg_p->encoded_resp_p, &decoded_resp,
            &msg_p->svr_addr, msg_p->max_resp_sz);



Why the "else" there?  Shouldn't we break as soon as this error
happens?  The RobR comment has me confused.

                -- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to