I've been struggling with crashes in mthca_arbel_map_phys for a few days 
(triggered
by RDS), and I think I'm finally making some progress

mthca_fmr_alloc does this:

        if (mthca_is_memfree(dev)) {
                err = mthca_table_get(dev, dev->mr_table.mpt_table, key);
                if (err)
                        goto err_out_mpt_free;
        ...
        }

        /* when we get here, err == 0 (at least for memfree cards) */
        mr->mtt = __mthca_alloc_mtt(dev, list_len, dev->mr_table.fmr_mtt_buddy);
        if (IS_ERR(mr->mtt))
                goto err_out_table;

err_out_table:
        /* clean up some */
        return err;

ie we set mr->mtt to some ERR_PTR(-whatever), and return success.

The same problem exists when mailbox allocation fails.

I fixed this, using the patch below. Now I'm making some progress:
First, the kernel reports:

RDS/IB: ib_alloc_fmr failed (err=-12)

which is good - now we get a decent error code instead of a crash.
A little later, it complains:

ib_mthca 0000:05:00.0: SW2HW_MPT returned status 0x0a

which doesn't sound quite as good... and things are very hosed
from that moment on; reloading ib_mthca seems to fix things, however.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
--------------- snip -------------------
From: Olaf Kirch <[EMAIL PROTECTED]>
Subject: Return proper error codes from mthca_fmr_alloc

If the allocation of the MTT or the mailbox failed, mthca_fmr_alloc
would return 0 (success) no matter what. This leads to crashes a little
down the road, when we try to dereference eg mr->mtt, which was
really ERR_PTR(-ENOMEM).

Signed-off-by: Olaf Kirch <[EMAIL PROTECTED]>
---
 drivers/infiniband/hw/mthca/mthca_mr.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Index: ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_mr.c
===================================================================
--- ofa_kernel-1.3.orig/drivers/infiniband/hw/mthca/mthca_mr.c
+++ ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_mr.c
@@ -613,8 +613,10 @@ int mthca_fmr_alloc(struct mthca_dev *de
                        sizeof *(mr->mem.tavor.mpt) * idx;
 
        mr->mtt = __mthca_alloc_mtt(dev, list_len, dev->mr_table.fmr_mtt_buddy);
-       if (IS_ERR(mr->mtt))
+       if (IS_ERR(mr->mtt)) {
+               err = PTR_ERR(mr->mtt);
                goto err_out_table;
+       }
 
        mtt_seg = mr->mtt->first_seg * MTHCA_MTT_SEG_SIZE;
 
@@ -627,8 +629,10 @@ int mthca_fmr_alloc(struct mthca_dev *de
                mr->mem.tavor.mtts = dev->mr_table.tavor_fmr.mtt_base + mtt_seg;
 
        mailbox = mthca_alloc_mailbox(dev, GFP_KERNEL);
-       if (IS_ERR(mailbox))
+       if (IS_ERR(mailbox)) {
+               err = PTR_ERR(mailbox);
                goto err_out_free_mtt;
+       }
 
        mpt_entry = mailbox->buf;
 


_______________________________________________
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Reply via email to