The problem isn't whether cond_wait is atomic. It's what happens to the algorithm if it's not.
Imagine the scenario where it's not atomic (and this was the part where I agreed with Tom) and you have the mutex locked in the cond_wait call, but the thread isn't in the queue yet.
Now this thread gets interrupted by whatever event and you perform a ...cond_broadcast(). All the threads are woken up except that one not in there yet. You have a thread waiting a cond_var you weren't aware of... actually you have that thread waiting on that condition variable while you already performed a broadcast. That's pretty weird for the algorithm.
Okay, but I don't agree that this situation would generate a problem in correctly written CV code. As long as the mutex is still held by the trying-to-sleep thread when the broadcast() occurs, then its *reason* for sleeping will still be true and hence there will eventually be another thread to come along and wake it up.
However, I am concerned that you introduce this scenario with "where it's not atomic". Are there cases where cond_wait() is not atomic it is necessary to write code to take that into account? So my question is still, what these correct but not atomic cond_wait() implementations are like and how putting the broadcast() into the protection of the mutex would help.
I should also say that I have not looked that the particular callback code at issue here. Perhaps it is using broadcast() in some unusual fashion (i.e. not using the producer/consumer model) that affects this discussion.
Ted _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
