Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re: [PATCH] cm refcount race fix
> 
> It does seem we can simplify mthca_cq in a slightly different way.
> mthca_cq_clean() doesn't need to take a CQ reference, because we know
> the CQ can't go away before all associated QPs are gone, and at least
> one QP will stay around until mthca_cq_clean() returns.
> 
> So the below patch is both a fix and a decent cleanup:
> 
> --- infiniband/hw/mthca/mthca_provider.h      (revision 6945)
> +++ infiniband/hw/mthca/mthca_provider.h      (working copy)
> @@ -197,7 +197,7 @@ struct mthca_cq_resize {
>  struct mthca_cq {
>       struct ib_cq            ibcq;
>       spinlock_t              lock;
> -     atomic_t                refcount;
> +     int                     refcount;
>       int                     cqn;
>       u32                     cons_index;
>       struct mthca_cq_buf     buf;
> --- infiniband/hw/mthca/mthca_dev.h   (revision 6945)
> +++ infiniband/hw/mthca/mthca_dev.h   (working copy)
> @@ -496,7 +496,7 @@ void mthca_free_cq(struct mthca_dev *dev
>  void mthca_cq_completion(struct mthca_dev *dev, u32 cqn);
>  void mthca_cq_event(struct mthca_dev *dev, u32 cqn,
>                   enum ib_event_type event_type);
> -void mthca_cq_clean(struct mthca_dev *dev, u32 cqn, u32 qpn,
> +void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn,
>                   struct mthca_srq *srq);
>  void mthca_cq_resize_copy_cqes(struct mthca_cq *cq);
>  int mthca_alloc_cq_buf(struct mthca_dev *dev, struct mthca_cq_buf *buf, int 
> nent);
> --- infiniband/hw/mthca/mthca_cq.c    (revision 6945)
> +++ infiniband/hw/mthca/mthca_cq.c    (working copy)
> @@ -234,14 +234,19 @@ void mthca_cq_event(struct mthca_dev *de
>  {
>       struct mthca_cq *cq;
>       struct ib_event event;
> +     unsigned long flags;
>  
> -     spin_lock(&dev->cq_table.lock);
> +     spin_lock_irqsave(&dev->cq_table.lock, flags);
>  
>       cq = mthca_array_get(&dev->cq_table.cq, cqn & (dev->limits.num_cqs - 
> 1));
>  
> -     if (cq)
> -             atomic_inc(&cq->refcount);
> -     spin_unlock(&dev->cq_table.lock);
> +     if (cq) {
> +             spin_lock(&cq->lock);
> +             ++cq->refcount;
> +             spin_unlock(&cq->lock);
> +     }
> +
> +     spin_unlock_irqrestore(&dev->cq_table.lock, flags);


Hmm. I see you take cq->lock inside cq_table.lock

OTOH in mthca_qp.c we have:

        spin_lock_irq(&send_cq->lock);
        if (send_cq != recv_cq)
                spin_lock(&recv_cq->lock);

        spin_lock(&dev->qp_table.lock);

So qp_table.lock is taken inside cq->lock.

I can't prove its a problem, but locking rules are getting confusing -
it was better when all table locks where inner-most.

As a solution, we can decide that cq recount is protected by cq_table.lock.
what do you say.


-- 
MST
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to