I've been struggling with crashes in mthca_arbel_map_phys for a few days (triggered by RDS), and I think I'm finally making some progress
mthca_fmr_alloc does this: if (mthca_is_memfree(dev)) { err = mthca_table_get(dev, dev->mr_table.mpt_table, key); if (err) goto err_out_mpt_free; ... } /* when we get here, err == 0 (at least for memfree cards) */ mr->mtt = __mthca_alloc_mtt(dev, list_len, dev->mr_table.fmr_mtt_buddy); if (IS_ERR(mr->mtt)) goto err_out_table; err_out_table: /* clean up some */ return err; ie we set mr->mtt to some ERR_PTR(-whatever), and return success. The same problem exists when mailbox allocation fails. I fixed this, using the patch below. Now I'm making some progress: First, the kernel reports: RDS/IB: ib_alloc_fmr failed (err=-12) which is good - now we get a decent error code instead of a crash. A little later, it complains: ib_mthca 0000:05:00.0: SW2HW_MPT returned status 0x0a which doesn't sound quite as good... and things are very hosed from that moment on; reloading ib_mthca seems to fix things, however. Olaf -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play [EMAIL PROTECTED] | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax --------------- snip ------------------- From: Olaf Kirch <[EMAIL PROTECTED]> Subject: Return proper error codes from mthca_fmr_alloc If the allocation of the MTT or the mailbox failed, mthca_fmr_alloc would return 0 (success) no matter what. This leads to crashes a little down the road, when we try to dereference eg mr->mtt, which was really ERR_PTR(-ENOMEM). Signed-off-by: Olaf Kirch <[EMAIL PROTECTED]> --- drivers/infiniband/hw/mthca/mthca_mr.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) Index: ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_mr.c =================================================================== --- ofa_kernel-1.3.orig/drivers/infiniband/hw/mthca/mthca_mr.c +++ ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_mr.c @@ -613,8 +613,10 @@ int mthca_fmr_alloc(struct mthca_dev *de sizeof *(mr->mem.tavor.mpt) * idx; mr->mtt = __mthca_alloc_mtt(dev, list_len, dev->mr_table.fmr_mtt_buddy); - if (IS_ERR(mr->mtt)) + if (IS_ERR(mr->mtt)) { + err = PTR_ERR(mr->mtt); goto err_out_table; + } mtt_seg = mr->mtt->first_seg * MTHCA_MTT_SEG_SIZE; @@ -627,8 +629,10 @@ int mthca_fmr_alloc(struct mthca_dev *de mr->mem.tavor.mtts = dev->mr_table.tavor_fmr.mtt_base + mtt_seg; mailbox = mthca_alloc_mailbox(dev, GFP_KERNEL); - if (IS_ERR(mailbox)) + if (IS_ERR(mailbox)) { + err = PTR_ERR(mailbox); goto err_out_free_mtt; + } mpt_entry = mailbox->buf; _______________________________________________ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg