Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> >And I'm worried about the extensive use of atomic operations
> >this patch introduces - both performance and race-condition-wise.
> >Can't we stick to simple locking? Replacing completion with a single
> >BUSY bit looks especially scary.
> 
> I reworked the locking for this 3-4 times before I came up with something 
> that I
> felt was simple, yet could still do the job.

> felt was simple, yet could still do the job.  The difficulty is that multiple
> threads can try to join the same group at once, each with different join
> parameters, while existing members leave it.

No, I was only speaking about IPoIB part of the patch.
I didn't look at atomic usage inside the new module yet.

> Or a user can leave a group before
> their initial join request even completes.  The latter is a problem even for a
> single user, such as ipoib, which it doesn't handle.

Sorry, I don't follow.
Could you please give a scenario where ipoib fails but multicast module
works?

> I really don't think the performance of the code is as much an issue versus 
> the
> time required to configure the fabric.

You might be right. But I wander whether we'll regret it later that
we switched to the slower generic thing when we already had a stable,
streamlined version.

Speaking of which - there seems to be some liner scans of outstanding
requests which could be a scalability problem if there's a
large number of these, isn't that right?

> >> -  mutex_lock(&mcast_mutex);
> >> +  /* Clear the busy flag so we try again */
> >> +  status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
> >>
> >> +  mutex_lock(&mcast_mutex);
> >>    spin_lock_irq(&priv->lock);
> >> -  mcast->query = NULL;
> >> -
> >> -  if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) {
> >> -          if (status == -ETIMEDOUT)
> >> -                  queue_work(ipoib_workqueue, &priv->mcast_task);
> >> -          else
> >> -                  queue_delayed_work(ipoib_workqueue, &priv->mcast_task,
> >> -                                     mcast->backoff * HZ);
> >> -  } else
> >> -          complete(&mcast->done);
> >> +  if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
> >> +          queue_delayed_work(ipoib_workqueue, &priv->mcast_task,
> >> +                             mcast->backoff * HZ);
> >>    spin_unlock_irq(&priv->lock);
> >>    mutex_unlock(&mcast_mutex);
> >>
> >> -  return;
> >> +  return status;
> >>  }
> >
> >We used to do complete last thing on mcast object, now you are
> >touching the object after IPOIB_MCAST_FLAG_BUSY is cleared, apparently.
> 
> The patch removes the need to check both mcast->query and a flag to determine
> state.  It only uses the flag.  Can you clarify what issue you see with the
> above code?

We used to have a completion to signal no callbacks are running, and we set it
last thing.  Now there seems to be a window after BUSY is clear but you are
still accessing mcast->backoff. No?


> >What prevents us from exiting while callbacks are in progress?
> >Basically same applies wherever we used to call wait_for_mcast_join.
> 
> ib_free_multicast() blocks while any callbacks are running.

I don't follow. ib_free_multicast seems to be called only when we
leave all mcast groups. So it seems callbacks can now run
after we do stop_thread which I think will lead to crashes.

> >I also started to wander why do we need a new API for this at all?
> >Can't the sa module be fixed to refcount the mcast joins properly for us,
> >with minor or no API changes?
> 
> The difference is that this allows the free to match up with exactly one join
> request.  This will be needed for userspace support.  Additionally, the
> callback remains active beyond the initial join, so that the multicast module
> can notify the user when an errors occur on the multicast group that requires
> re-joining.

But I still don't understand - if everyone must use the refcounted API,
why do we need a separate module and an exported API for something
that is just an implementation detail?

I also have the following question on the API:

> +struct ib_multicast {
> +       struct ib_sa_mcmember_rec rec;
> +       ib_sa_comp_mask         comp_mask;
> +       int                     (*callback)(int status,
> +                                           struct ib_multicast *multicast);
> +       void                    *context;
> +};

Multiple distinct mcmember_rec queries could get us the same mcast group.
Say MTU > 256 and MTU > 512 look different but actually will get
same group in practice.

Say 2 clients ask for these 2 queries.
What will be in the ib_sa_mcmember_rec in this case?

-- 
MST

_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to