Roland Dreier wrote:
> I don't believe we should generate receive callbacks for canceled
> sends, so I came up with the patch below (much simpler than the
> explanation that led up to it).  I am no longer able to reproduce the
> IPoIB crash with this applied so I feel pretty good about this.

I agree that this should be the case.  If you look in ib_find_send_mad(), it 
checks that the wr->status for the send is still IB_WC_SUCCESS, but only in one 
of the two return paths.  I think that we either want to fix the problem in 
ib_find_send_mad() or remove the check for status there.

struct ib_mad_send_wr_private*
ib_find_send_mad(struct ib_mad_agent_private *mad_agent_priv,
                 struct ib_mad_recv_wc *wc)
{
        struct ib_mad_send_wr_private *wr;
        struct ib_mad *mad;

        mad = (struct ib_mad *)wc->recv_buf.mad;

        list_for_each_entry(wr, &mad_agent_priv->wait_list, agent_list) {
                if ((wr->tid == mad->mad_hdr.tid) &&
                    rcv_has_same_class(wr, wc) &&
                    /*
                     * Don't check GID for direct routed MADs.
                     * These might have permissive LIDs.
                     */
                    (is_direct(wc->recv_buf.mad->mad_hdr.mgmt_class) ||
                     rcv_has_same_gid(mad_agent_priv, wr, wc)))
                        return wr;
*** Missing check for status == SUCCESS
        }

        /*
         * It's possible to receive the response before we've
         * been notified that the send has completed
         */
        list_for_each_entry(wr, &mad_agent_priv->send_list, agent_list) {
                if (is_data_mad(mad_agent_priv, wr->send_buf.mad) &&
                    wr->tid == mad->mad_hdr.tid &&
                    wr->timeout &&
                    rcv_has_same_class(wr, wc) &&
                    /*
                     * Don't check GID for direct routed MADs.
                     * These might have permissive LIDs.
                     */
                    (is_direct(wc->recv_buf.mad->mad_hdr.mgmt_class) ||
                     rcv_has_same_gid(mad_agent_priv, wr, wc)))
                        /* Verify request has not been canceled */
                        return (wr->status == IB_WC_SUCCESS) ? wr : NULL;
*** Has check against canceled MADs
        }
        return NULL;
}

- Sean

_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to